1. Motivation¶
- What is your dataset?
For our project we've utilized multiple datasets. The main ones we've used are 'Historical Hourly Galicia Weather Dataset Recorded between 01-01-2000 and 30-04-2024', and 'Detected Wildfire Incidents and their Severity Dataset for the years between 2001-2022'. We've used these datasets for our machine learning prediction model to predict the critical threshold that might trigger a wildfire for correlated weather parameters which will be explained in the Data Analysis section in detail. We also used other datasets such as- 'Total pollutant gasses and particles released from wildfires happening in Galicia between 2002-2023' (to showcase the environmental impact of wildfires)
- 'Ecologically Irreplaceable Areas of Galicia' (indicates zones to be protected against any harm)
- 'High Wildfire Risk Zones of Galicia' (indicate zones prone to wildfires)
- 'Predicted weather data for 01-22 May 2024' (for spotting dates when dangerous levels are obtained from the developed ML model which are reached for the wildfire-correlated weather parameters which might trigger a wildfire for a future time period).
Finally, we got some datasets for showcasing regional tree cover extent, fire alert count and forest loss due to wildfires to inform the audience.
- Why did you choose this/these particular dataset(s)?
Reasons for each dataset can be described as below:
- Historical Hourly Galicia Weather Dataset Recorded between 01-01-2000 and 30-04-2024 :
The relationship between some weather parameters and wildfires are widely researched academically for the prediction of wildfires. For instance; There is a rule of thumb called 30-30-30 rule for the temperature, humidity and wind speed parameters which can be used for fire prediction as it is stated in the source. (Aug 8, 2018. How the 30-30-30 Crossover Rule affects the threat of a wildfire sparking https://www.kelownanow.com/watercooler/news/news/Okanagan/%20How_the_30_30_30_Crossover_Rule_affects_the_threat_of_a_wildfire_sparking/#fs_136857)
Therefore we wanted to investigate some weather parameters which we think that are correlated with the wildfire indicents. The possible effects of some weather parameters we have in our dataset is explained as below:- High temperatures can dry out vegetation, making it more susceptible to ignition and increasing the likelihood of fires.
- Low humidity levels can dry out vegetation, making it more flammable and contributing to the rapid spread of fires.
- Evapotranspiration parameter indicates the amount of water lost from the soil and vegetation, affecting fuel moisture content and fire risk.
- Vapour Pressure Deficit measures the difference between the amount of moisture in the air and the maximum amount of moisture the air can hold, influencing vegetation dryness and fire behavior.
- High wind speeds can accelerate the spread of fires by carrying embers and flames, making containment efforts more challenging.
- Elevated soil temperatures can dry out vegetation and contribute to the overall flammability of the environment.
- Low soil moisture levels can lead to drier vegetation, increasing the likelihood of fires and their intensity.
- Direct Normal Solar Irradiance can dry out vegetation and contribute to the overall fire risk in an area.
Source for the dataset: https://open-meteo.com/
- Detected Wildfire Incidents and their Severity Dataset for the years between 2001-2022:
When working with the dynamics of wildfires in Galicia region (and its relation to climate change) it is important to select the appropriate satellite data. The Moderate Resolution Imaging Spectroradiometer (MODIS) and the Visible Infrared Imaging Radiometer Suite (VIIRS) are two significant instruments used for remote sensing and scientific research.(Joseph M. Smith. Apr 6, 2022. VIIRS Instruments Become More Essential As Terra and Aqua Drift from their Traditional Orbits. https://www.earthdata.nasa.gov/learn/articles/modis-to-viirs-transition) Our project opts for MODIS data for several reasons. One of the primary reasons is the continuity of MODIS data. MODIS has been operational since 1999, providing over two decades of consistent and reliable data on various environmental parameters, including wildfires. VIIRS on the other hand has a higher geospatial data resolution, but only became operational in 2011. Though MODIS delivers coarser data resolution (250-1000m) compared to VIIRS it remains more than adequate for our objective to detect and analyze wildfires at a regional scale. The late deployment of VIIRS falls short on the availability of long-term datasets for our historical analysis.Fire radiative power (FRP) is the most effective indicator for analyzing fire severity in comparison to e.g., brightness. It is a direct measure of the radiant heat energy released. FRP provides a quantifiable measure of the fire’s intensity, which has a close relation to the severity of the incident. Brightness provides a more subjective measure and is more prone to be affected by outside factors such as angle, atmospheric conditions etc. (Laurent, P. and Mouillot, F. and Moreno, M. V. and Yue, C. and Ciais, P. 2019. Varying relationships between fire radiative power and fire size at a global scale https://bg.copernicus.org/articles/16/275/2019/). Furthermore, FRP holds a close correlation between fire behavior, including rate of spread, fuel consumption, emissions etc. This makes FRP our best tool for assessing severity and predicting behaviour.
Source for the dataset: https://firms.modaps.eosdis.nasa.gov/
- Total pollutant gases and particles released from wildfires happened in Galicia between 2002-2023 dataset:
This dataset provides detailed information on the emission factors of various species (chemical compounds and particulate matter) for wildfires. (Yongqiang Liu, Scott Goodrick, Warren Heilman, 2014Wildland fire emissions, carbon, and climate: Wildfire–climate interactions https://www.sciencedirect.com/science/article/pii/S037811271300114X) Here's a breakdown of the components and what they represent:
- CO2 (Carbon Dioxide): The primary greenhouse gas emitted through the burning of biomass, representing the amount of carbon dioxide released per kilogram of dry matter burned.
- CO (Carbon Monoxide): A harmful pollutant that is a byproduct of incomplete combustion, contributing to air pollution and human health issues.
- CH4 (Methane): A potent greenhouse gas with a higher warming potential than CO2, though released in smaller quantities during biomass burning.
- NMHC (Non-Methane Hydrocarbons): Volatile organic compounds excluding methane, contributing to ozone formation and air quality degradation.
- H2 (Hydrogen): Released during combustion, contributing minimally to direct greenhouse gas effects but involved in atmospheric chemical reactions.
- NOx (Nitrogen Oxides, as NO): Contributing to the formation of smog and acid rain, and affecting atmospheric chemistry and climate.
- N2O (Nitrous Oxide): A powerful greenhouse gas with a long atmospheric lifetime, contributing to global warming and ozone layer depletion.
- PM2.5 (Particulate Matter with diameter less than 2.5 micrometers): Fine particles that pose significant health risks due to their ability to penetrate deep into the respiratory tract.
- TPM (Total Particulate Matter): Represents the total mass of particles emitted per kilogram of dry matter burned.
- TPC (Total Particulate Carbon, consisting of OC+BC): The sum of organic carbon (OC) and black carbon (BC), contributing to climate change and air pollution.
- OC (Organic Carbon): Part of particulate matter, affecting climate and air quality.
- BC (Black Carbon): A component of fine particulate matter, significantly affecting the climate by absorbing sunlight.
- SO2 (Sulfur Dioxide): Contributes to acid rain and has harmful health impacts.
- NH3 (Ammonia): Affects atmospheric chemical processes and particulate matter formation.
- DMCC (Dry Matter Carbon Content): Indicates the percentage of carbon in the dry matter, used for converting carbon emissions to the equivalent amount of dry matter burned.
Source of the dataset: https://gwis.jrc.ec.europa.eu/apps/country.profile/downloads
- Predicted weather data for 01-22 May 2024: We can obtain forecasted weather parameters which we found correlated with wildfires to later detect the dangerous time periods for wildfires. For the trigger zones found with ML model, we can detect the danger of fires early for a weather parameter that can trigger high fire risks.
Source of the dataset: https://open-meteo.com/en/docs
- Total Annual Tree Cover Loss and Annual Tree Cover Loss due to Wildfires in Galicia, Spain data:
We use this dataaset mainly to show the increasing on the trend of tree cover loss by each year and wildfires' role for the loss of tree cover for Galicia, Spain region to inform the audiance about seriousness of wildfire issue. Not all tree cover is lost due to wildfires, shifting agriculture, forestry, intentional man-made precaution fires and urbanization factors has also effect on tree loss.
Source of the dataset: https://www.globalforestwatch.org/
- Mean Burned Area in ha per $Km^{2}$ Area and Mean Wildfire Incidents per $Km^{2}$ Area by Regions of Spain - [2002-2023] Data:
This dataset is use for just showcasing why we choose to investigate Galicia over the other regions of Spain. Clearly Galicia is more prone to be damaaged harshly by wildfires when we look at the amount of burned area and high number of wildfire incidents.
Source of the dataset: https://gwis.jrc.ec.europa.eu/apps/country.profile/
- Total fire alerts 2001-2023 and Tree cover distributon of Galicia by its Subregions Data:
This data is just used for showcasing the tree cover distribution and amount of recorded historical fire alerts of Galicia by its Subregions to show which subregion of Galicia has higher risk for wildfires than the others.
Source of the dataset: https://www.globalforestwatch.org/
- Ecologically Irreplaceible Areas of Galicia Dataset:
This data is used to detect strategically important, ecologically irreplaceable zones inside the Galicia, to raise awareness for the reader to understand the zones need extra precaution and detection.
Source of the dataset: https://forest-fire.emergency.copernicus.eu/
- High Wildfire Risk Zones of Galicia Dataset:
This dataset shows the high risk the zones inside Galicia considering vegetation and wildfire modelling to raise awareness about critical zones that contain high risk in Galicia.
Source of the dataset: https://forest-fire.emergency.copernicus.eu/
- What was your goal for the end user's experience?
- Raising awareness about the increasing trend of wildfire incidents on Mediterrenean Countries by focusing on Galicia, Spain and their effect on the treecover loss and release of greenhouse gasses and pollutant particles which cause irreplaceable ecological damage to environment and wildlife.
- Informing the audiance about the relationship of weather parameters and wildfire incidents.
- Suggesting a way to predict the weather conditions which might likely to trigger wildfires with machine learning modelling by predicting the danger zones for wildfire-correlated weather parameters by training the model with historical weather data and historical wildfire incidents.
2. Basic stats¶
- Write about your choices in data cleaning and preprocessing?
- Merging data across different time periods.
- Aggregation of data to a more extended time period where it is needed to match the time parameters of two datasets to merge.
- Merging of datasets which contain useful features for machine learning modelling
- Cleaning the missing values and dropping unnecessary columns of each dataset
- Filtering the dataset for Galicia coordinates and detection with high confidence intervals
- Using only the features with correlation by doing correlation analysis
- Parsing the data for datetime and adding some time period indicating columns for the month of year, season, week of a year, day of a year, hour of a day, day of a week etc.
- Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis
The description of statistics for the dataset which contain the most important features are shown in the summary of statistics tables below. The plots used to do the EDA can be observable in Data Visualizations and EDA part of this notebook later when you scroll the notebook down.
| Variable Name | count | mean | std | min | 25% | 50% | 75% | max |
|---|---|---|---|---|---|---|---|---|
| temperature_2m (°C) | 213288.00 | 11.73 | 6.06 | -5.80 | 7.50 | 11.30 | 15.50 | 36.30 |
| relative_humidity_2m (%) | 213288.00 | 82.25 | 14.77 | 19.00 | 73.00 | 87.00 | 94.00 | 100.00 |
| et0_fao_evapotranspiration (mm) | 213288.00 | 0.10 | 0.15 | 0.00 | 0.00 | 0.02 | 0.14 | 0.80 |
| vapour_pressure_deficit (kPa) | 213288.00 | 0.32 | 0.42 | 0.00 | 0.07 | 0.16 | 0.41 | 4.23 |
| wind_speed_10m (km/h) | 213288.00 | 12.27 | 6.68 | 0.00 | 7.10 | 11.00 | 16.50 | 55.10 |
| soil_temperature_0_to_7cm (°C) | 213288.00 | 12.57 | 6.18 | -2.40 | 8.00 | 11.90 | 16.70 | 34.40 |
| soil_moisture_0_to_7cm (m³/m³) | 213288.00 | 0.31 | 0.10 | 0.09 | 0.23 | 0.34 | 0.39 | 0.44 |
| direct_normal_irradiance_instant (W/m²) | 213288.00 | 185.55 | 283.67 | 0.00 | 0.00 | 0.00 | 317.80 | 983.90 |
| Variable Name | count | mean | std | min | 25% | 50% | 75% | max |
|---|---|---|---|---|---|---|---|---|
| latitude | 22277.00 | 42.46 | 0.40 | 41.81 | 42.14 | 42.40 | 42.74 | 43.73 |
| longitude | 22277.00 | -7.81 | 0.71 | -9.27 | -8.45 | -7.78 | -7.18 | -6.73 |
| brightness | 22277.00 | 325.26 | 22.83 | 300.00 | 309.70 | 319.20 | 333.70 | 505.40 |
| scan | 22277.00 | 1.73 | 0.89 | 1.00 | 1.10 | 1.40 | 2.10 | 4.80 |
| track | 22277.00 | 1.25 | 0.27 | 1.00 | 1.00 | 1.20 | 1.40 | 2.00 |
| confidence | 22277.00 | 73.45 | 23.24 | 0.00 | 59.00 | 77.00 | 95.00 | 100.00 |
| bright_t31 | 22277.00 | 293.05 | 10.20 | 265.10 | 286.40 | 292.20 | 299.90 | 400.10 |
| frp | 22277.00 | 66.26 | 125.29 | 0.00 | 14.80 | 30.20 | 66.90 | 2956.20 |
| type | 22277.00 | 0.01 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
| year | 22277.00 | 2009.29 | 5.97 | 2001.00 | 2005.00 | 2006.00 | 2013.00 | 2022.00 |
| season | 22277.00 | 2.94 | 0.85 | 1.00 | 3.00 | 3.00 | 4.00 | 4.00 |
| month | 22277.00 | 7.26 | 2.48 | 1.00 | 7.00 | 8.00 | 9.00 | 12.00 |
| week | 22277.00 | 29.55 | 10.57 | 1.00 | 27.00 | 32.00 | 36.00 | 53.00 |
| day_of_week | 22277.00 | 4.06 | 2.10 | 1.00 | 2.00 | 4.00 | 6.00 | 7.00 |
| hour | 22277.00 | 16.20 | 5.73 | 3.00 | 12.00 | 14.00 | 23.00 | 24.00 |
| day_of_month | 22277.00 | 14.62 | 7.74 | 1.00 | 8.00 | 14.00 | 20.00 | 31.00 |
| day_of_year | 22277.00 | 204.34 | 74.30 | 1.00 | 188.00 | 221.00 | 248.00 | 366.00 |
| Variable Name | count | mean | std | min | 25% | 50% | 75% | max |
|---|---|---|---|---|---|---|---|---|
| idprovincia | 72757.00 | 28.26 | 8.02 | 15.00 | 27.00 | 32.00 | 36.00 | 36.00 |
| burnt_area | 72757.00 | 5.09 | 60.08 | 0.00 | 0.05 | 0.23 | 1.00 | 7352.14 |
| latitude | 72577.00 | 42.54 | 0.53 | 4.67 | 42.17 | 42.44 | 42.88 | 78.64 |
| longitude | 72575.00 | -8.07 | 0.71 | -9.43 | -8.57 | -8.10 | -7.60 | 47.51 |
| year | 72757.00 | 2007.73 | 4.00 | 2003.00 | 2004.00 | 2006.00 | 2011.00 | 2018.00 |
| season | 72757.00 | 2.79 | 0.92 | 1.00 | 2.00 | 3.00 | 3.00 | 4.00 |
| month | 72757.00 | 6.53 | 2.65 | 1.00 | 4.00 | 7.00 | 8.00 | 12.00 |
| week | 72757.00 | 26.60 | 11.46 | 1.00 | 15.00 | 30.00 | 35.00 | 53.00 |
| day_of_week | 72757.00 | 4.14 | 2.02 | 1.00 | 2.00 | 4.00 | 6.00 | 7.00 |
| hour | 72757.00 | 15.98 | 6.14 | 1.00 | 14.00 | 17.00 | 20.00 | 24.00 |
| day_of_month | 72757.00 | 15.65 | 8.35 | 1.00 | 9.00 | 16.00 | 22.00 | 31.00 |
| day_of_year | 72757.00 | 183.25 | 80.36 | 1.00 | 104.00 | 207.00 | 243.00 | 366.00 |
| Variable Name | count | mean | std | min | 25% | 50% | 75% | max |
|---|---|---|---|---|---|---|---|---|
| year | 264.00 | 2012.50 | 6.36 | 2002.00 | 2007.00 | 2012.50 | 2018.00 | 2023.00 |
| month | 264.00 | 6.50 | 3.46 | 1.00 | 3.75 | 6.50 | 9.25 | 12.00 |
| CO2 | 264.00 | 51182.32 | 249922.85 | 0.00 | 370.15 | 6157.90 | 20546.72 | 3546581.06 |
| CO | 264.00 | 2417.84 | 11604.11 | 0.00 | 17.43 | 314.05 | 981.43 | 163093.14 |
| TPM | 264.00 | 411.11 | 1980.41 | 0.00 | 2.35 | 50.55 | 177.45 | 27643.65 |
| PM25 | 264.00 | 307.18 | 1495.35 | 0.00 | 1.98 | 39.04 | 127.99 | 20972.60 |
| TPC | 264.00 | 201.81 | 979.10 | 0.00 | 1.04 | 22.74 | 93.03 | 13548.09 |
| NMHC | 264.00 | 196.74 | 929.36 | 0.00 | 1.20 | 24.87 | 89.07 | 12881.31 |
| OC | 264.00 | 187.38 | 910.65 | 0.00 | 0.98 | 20.41 | 87.47 | 12582.94 |
| CH4 | 264.00 | 90.63 | 425.13 | 0.00 | 0.61 | 12.09 | 39.37 | 5923.37 |
| SO2 | 264.00 | 24.15 | 117.54 | 0.00 | 0.13 | 2.85 | 10.24 | 1640.21 |
| BC | 264.00 | 14.29 | 68.02 | 0.00 | 0.10 | 1.82 | 5.87 | 954.82 |
| NOx | 264.00 | 88.06 | 435.89 | 0.00 | 0.58 | 10.54 | 36.76 | 6250.45 |
3. Data Analysis¶
- Describe your data analysis and explain what you've learned about the dataset. So with data collection, data cleaning, data analysis, data interpretation and visualization we learned correct aggregation of data, importance of removing missing values and unnecessary attributes, criticality of merging the data correctly, parsing the date and formatting the features correctly, using only the correlated features for our visualizations and ML model are critically important and with the plots we created the trends can be observed and these insight can be learned from them:
- From Figure 1: We understand Galicia is the most prone region for wildfires in Spain due to high number of fire incidents and most burned area.
- From Figure 2: We understand there is increasing trend on tree loss covers by each year in Galicia and considerable amount of it is caused by wildfires which is also increase in trend.
- From Figure 3 and 3.1: Ourense is the the most prone region to suffer from wildfires with the critically high number of fire alerts.
- From Figure 4: We understand the places where a serious wildfire incident happened in the past which burns more than 100 ha area, and which subregion that wildfire incident happened historically. We also visually can see the danger level of a historical fire incident happened in the past in Galici by looking at the diameter of circles.
- From Figure 5: This figure makes us understand the severity of a wild fire incident happened in Galicia by using historical Fire Radiative Power generated by wildfires. Severeness of wildfire is represented with heatmapping for us to understand.
- From Figure 6: We understand August has the most critical month for wildfires because it has the most total emitted CO2 Greenhouse Gas which is the main pollutant released by wildfires.
- From Figure 7: We learn that there are also very harmful pollutants are released to the environment with wildfires, so wildfires does not only harm environment by removing tree cover and vegetation of a zone, wildfires also pollutes the zone in many ways dramatically with other ways and August is again the most critical month with the highest value bars for released pollutants.
- From Figure 8: An overall vision is provided to us with Calendar Plot showing the frequency of daily wildfire incidents (2001-2022) so that we can have an overall approach which days, months, years in history are more unfortunate with relative to others and can see the wildfire likelihood pattern with this.
- From Figure 9: We can analyze the most correlated weather parameters with fire incidents and their positive-negative relationship with wildfires.
Then rest of the figures are showing the results of our machine learning prediction model which will be explained in the next section.
- If relevant, talk about your machine-learning.
In our machine learning prediction model we aimed to predict the critical threshold value which creates a zone for high danger for certain weather parameters(the ones which have at least low-moderate correlation with the wildfire incidents) which can cause an environment where high likelihood of wildfires can be expected for the future dates. We aimed to use these threshold values for certain weather parameter correlated with wildfires to create red danger zones for the future weather forecast data to predict the exact date and time when a there is possibly high likelihood and high risk-danger can be expected for wildfires to happen. We can later use that predicted time and date for possible wildfire occurences for the future, for protecting irreplaceable ecological zones and areas with fire prone vegetation which are shown in choropleth mapping in the end.
4. Genre.¶
Why magazine style genre? For this project we chose to use magazine style for presenting our findings and visualizing the data. The genre offers several advantages that make it an effective medium for the message we aim to convey. Firstly, magazine style is well-suited for presenting complex information in an engaging and accessible manner, allowing us to structure the content in a curated and visually appealing format. Ease of access and low barriers to entry are important for our target audience, who may not have a technical background or the base knowledge of the subject at hand. Magazine style genre allows to present the data and information with multiple facets - both in an informative and entertaining way, making it more likely that our audience will engage with the content and take informed decisions. Secondly, magazine style is well suited for presenting a mixture of different visualizations and working with diverse narrative tools in a curated way. By incorporating images, data visualizations and texts we can visually highlight important features and guide the viewer through the narrative. The style is also well-suited for incorporating interactive elements, such as hover highlighting and filtering, which can enhance the user experience and prove additional insights for more curious and/or professional users. Thirdly, the style works well with our linear curated narrative. By structuring the content in a logical and sequential manner, we believe we effectively can convey the information. The linear narrative also allows us to use captions, headlines and introductory text to provide context and summarize key points for our curated experience. Lastly, the magazine style is an effective medium for presenting to a broad audience - which matches well with our aspiration of aiming to present the data to both locals and tourists alike that may have vastly different knowledge on the subject. The style is well recognizable and the layout is known by most improving mapping and affordance considerations. We believe we can reach a wider audience by sticking to a familiar style. This is especially true when considering our machine learning feature of future risk predictions based on meteorological prognosis, where a familiar style to existing platforms (such as weather forecasts) are important for ease of access for repeat users.
Which tools did you use from each of the 3 categories of Visual Narrative (Figure 7 in Segal and Heer). Why?
Visual narrative tools:
Visual structuring: Consistent visual platform
As touched upon we strive to give the users a curated experience in order to take informed decisions. This tool ensures that the visual elements in our narrative are presented in a consistent and cohesive manner making it easier for the audience to follow the narrative.
Highlighting: Feature distinction
To draw attention to specific features within the narrative our main tools has been feature distinction. This is achieved through various visual elements.
Transition guidance: Object continuity and viewing angle
To help the users journey through the narrative we’ve strived for consistency through object continuity. This help create a sense of continuity and supports our curated approach. Our approach of viewing angles is also to strive for consistency. The perspective we present the user are striving to be constant. By combining object continuity with viewing angle, we can create a more engaging and immersive visual narrative to guide the users attention.
Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?
Narrative structure:
Ordering: Linear
Our storytelling takes a linear approach through the data being sequential and chronological. This induces order and creates a clear and logical structure, effectively making it easier for the audience to follow.
Interactivity: Hover highlighting, filtering, navigation
Interactivity is a key element in our storytelling, as it allows and facilitates audience engagement. In our case we used various interactivity tools to achieve this.
Messaging: Captions/Headlines, introductory text, summaries
In order to target a broad audience we need to make sure key messages are conveyed in an effective manner. This fact is even more important when taking the risks of misunderstanding the wildfire data into account. In our case we strive to effectively communicate the main ideas and insights or our findings in a clear and concise way.
5. Visualizations¶
- Explain the visualizations you've chosen. & - Why are they right for the story you want to tell?
Figure 1: Bar and line chart
Mean burned area and wildfire incidents per km2 by region in Spain(2002-2023) showing regional variations in fire severity and frequency. Number of fires incidents are shown with the line and total amount of burned area is shown with bars. On x-axis there are regions of Spain, on y axises we have burned area and number of fire incident values. This plot is used for choosing the most critical region of Spain where is very prone to wildfires.Figure 2: Stacked bar chart (but they are not added to each other, instead tree loss due to wildfire is infused to total tree loss to show what level of total tree cover is due to wildfire caused)
Annual tree cover loss in Galicia (Green), Spain from 2001 to 2023, distinguishing between total loss and loss specifically due to wildfires (Red), highlighting years with significant wildfire impact and general trend of tree cover loss.Figure 3: Pie chart
Distribution of tree cover across the subregions of Galicia with specific area measurements in hectares for each subregion of Galicia.Figure 3.1: Horizontal bar chart
Total alerts by subregions of Galicia, illustrating the number of alerts reported in each subregion from highest to lowest. Ourence is shown as most critical subregion of Galicia where is prone to wildfires.Figure 4: Interactive geospatial map overlaid with a point cluster map
Geospatial map illustrating historical wildfires in Galicia which burned more than 100 ha area. Each subregions historical wildfires are represented as different colours, and bigger the burned area of wildfire bigger the circle diameter. From there we spot exact locations where the severe fire incidents happened in the past in Galicia which might indicate high risk zones.Figure 5: Geospatial Heatmap
Heatmap of wildfire severity in Galicia from 2001-2022, categorized by Fire Radiative Power (FRP) in megawatts, illustrating areas with low, medium, and high wildfire intensity which can be later using on detecting high risk zones.Figure 6: Interactive Radial polar barchart
The plot displays the monthly distribution of CO2 emissions in metric tonees per kilogram of dry matter burns from wildfires in Galicia from 2002-2023, highlighting the peak emissions during the summer months.Figure 7: Interactive horizontal grouped barchart
Horizontal bar chart showing the monthly distribution of emissions from pollutants released by wildfire incidents, detailing the total emissions in metric tons per month for pollutants like CO (seperate bar with grouped with other stacked pollutants), CH4, NOx, and others highlighting the peak emissions during July, August, September, October.Figure 8: Calendar Plot
Indicates total detected fire incidents(0 to +50) in a day in Galicia between 2001-2022.Figure 9: Correlation matrix
Indicates the correlation heatmap for the weather parameters that are more correlated with wildfire incidents respect other weather parameter. Selected features are temperature, humidity, precipitation, wind speed, soil temperature, soil moisture, evapotranspiration vapor pressure deficit and solar radiation.Figure 10: Boxplot
Box plots displaying the distribution of key weather parameters confidence intervals for high likelihood of wildfire occurances according to ML model including temperature, humidity, precipitation, wind speed, soil temperature, soil moisture, evapotranspiration vapor pressure deficit and solar radiation, which are crucial in understanding wildfire risks.Figure 11: Interactive Time series Plot
With the threshold point that might trigger high possibility for wildfire occurances we generated red zones which are dangerous zones and if a weather parameter value goes inside the red zone, we can expect that the conditions might likely to allow wildfire to happen. We put future weather forecast to understand exact time and date where a weather parameter entered red zone to use that info to predict possible wildfires for the future for a location we know its weather conditions.Figure 12 and 13: Choropleth maps The purple choropleth map shows ecologically irreplacable zones which should be protected, so they can be prioritezed for wildfire preemptive measures going to be taken. The red choropleth map shows high risk zones by condsidering vegetatiton and wildfire modelling which can be used to raise wildfire awareness on that region to take more precautions.
6. Discussion¶
What went well?
We found a lot of useful open source dataset backing up our arguments and aims for our prediction model project. Because of this fact we were able to get a 87% accuracy score with the availability adn usefullness of our datasets.What is still missing? What could be improved?, Why? As we mentioned before there might be several reasons for tree cover loss and wildfires such as human mistakes, arson, man-made preemptive intentional forest fires, or forestry, agricultural shifting, urbanization etc. In our wildfire dataset, we don't have a parameter to make that distinction. Therefore it affects our correlation of weather features with wildfire incidents very badly. Because only for naturally occuring climate affected wildfires, we can say something about the effect and level of some weather conditions' effect for use of prediction of location and datetime for the wildfires in future periods. With that missing attribute found, we believe that we can improve our model remarkably.
7.Contributions¶
Coding Lead: Ali Berk Gezgin Support: Nael Rashdeen, Joakim Wiben Gundersen
Website/GitHub Lead: Nael Rashdeen Support: Ali Berk Gezgin, Joakim Wiben Gundersen
Narrative (writing) Lead: Joakim Wiben Gundersen Support: Nael Rashdeen, Ali Berk Gezgin
Loading the package libraries¶
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from matplotlib import cm
import seaborn as sns
from matplotlib.lines import Line2D
from matplotlib.colors import LinearSegmentedColormap
from bokeh.plotting import figure, show, output_file
from bokeh.models import ColumnDataSource, Legend, LegendItem
from bokeh.layouts import column
from bokeh.io import output_notebook
from bokeh.models import HoverTool
from bokeh.palettes import Category20
import itertools
from mpl_toolkits.basemap import Basemap
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import calplot
import geopandas as gpd
import folium
from folium.plugins import HeatMap
import geopandas as gpd
from shapely.geometry import Point
from shapely.geometry import Polygon
Data Preprocessing & Cleaning¶
Loading of Hourly Weather dataset for Galicia for the dates between 01-01-2000 00:00 to 30-04-2024 23:00
galicia_weather = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\Galicia_hourly_weather_data_00_24.csv")
galicia_weather.head(5)
| time | temperature_2m (°C) | relative_humidity_2m (%) | dew_point_2m (°C) | precipitation (mm) | pressure_msl (hPa) | surface_pressure (hPa) | cloud_cover (%) | et0_fao_evapotranspiration (mm) | vapour_pressure_deficit (kPa) | ... | soil_moisture_7_to_28cm (m³/m³) | soil_moisture_28_to_100cm (m³/m³) | soil_moisture_100_to_255cm (m³/m³) | is_day () | sunshine_duration (s) | shortwave_radiation_instant (W/m²) | direct_radiation_instant (W/m²) | diffuse_radiation_instant (W/m²) | direct_normal_irradiance_instant (W/m²) | terrestrial_radiation_instant (W/m²) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000-01-01T00:00 | 1.9 | 83 | -0.7 | 0.0 | 1029.0 | 963.6 | 29 | 0.0 | 0.12 | ... | 0.403 | 0.415 | 0.399 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1 | 2000-01-01T01:00 | 1.3 | 85 | -0.9 | 0.0 | 1029.2 | 963.6 | 24 | 0.0 | 0.10 | ... | 0.402 | 0.415 | 0.399 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 2000-01-01T02:00 | -0.1 | 89 | -1.7 | 0.0 | 1029.3 | 963.4 | 24 | 0.0 | 0.07 | ... | 0.402 | 0.415 | 0.399 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | 2000-01-01T03:00 | -1.7 | 92 | -2.9 | 0.0 | 1028.8 | 962.6 | 16 | 0.0 | 0.05 | ... | 0.402 | 0.415 | 0.399 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 2000-01-01T04:00 | -2.2 | 92 | -3.3 | 0.0 | 1028.7 | 962.4 | 8 | 0.0 | 0.04 | ... | 0.402 | 0.414 | 0.399 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 27 columns
galicia_weather.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 213288 entries, 0 to 213287 Data columns (total 27 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 213288 non-null object 1 temperature_2m (°C) 213288 non-null float64 2 relative_humidity_2m (%) 213288 non-null int64 3 dew_point_2m (°C) 213288 non-null float64 4 precipitation (mm) 213288 non-null float64 5 pressure_msl (hPa) 213288 non-null float64 6 surface_pressure (hPa) 213288 non-null float64 7 cloud_cover (%) 213288 non-null int64 8 et0_fao_evapotranspiration (mm) 213288 non-null float64 9 vapour_pressure_deficit (kPa) 213288 non-null float64 10 wind_speed_10m (km/h) 213288 non-null float64 11 wind_gusts_10m (km/h) 213288 non-null float64 12 soil_temperature_0_to_7cm (°C) 213288 non-null float64 13 soil_temperature_7_to_28cm (°C) 213288 non-null float64 14 soil_temperature_28_to_100cm (°C) 213288 non-null float64 15 soil_temperature_100_to_255cm (°C) 213288 non-null float64 16 soil_moisture_0_to_7cm (m³/m³) 213288 non-null float64 17 soil_moisture_7_to_28cm (m³/m³) 213288 non-null float64 18 soil_moisture_28_to_100cm (m³/m³) 213288 non-null float64 19 soil_moisture_100_to_255cm (m³/m³) 213288 non-null float64 20 is_day () 213288 non-null int64 21 sunshine_duration (s) 213288 non-null float64 22 shortwave_radiation_instant (W/m²) 213288 non-null float64 23 direct_radiation_instant (W/m²) 213288 non-null float64 24 diffuse_radiation_instant (W/m²) 213288 non-null float64 25 direct_normal_irradiance_instant (W/m²) 213288 non-null float64 26 terrestrial_radiation_instant (W/m²) 213288 non-null float64 dtypes: float64(23), int64(3), object(1) memory usage: 43.9+ MB
Loading the Modis fire dataset for the years between 2001-2022
fire2022 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2022_Spain.csv")
fire2021 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2021_Spain.csv")
fire2020 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2020_Spain.csv")
fire2019 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2019_Spain.csv")
fire2018 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2018_Spain.csv")
fire2017 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2017_Spain.csv")
fire2016 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2016_Spain.csv")
fire2015 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2015_Spain.csv")
fire2014 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2014_Spain.csv")
fire2013 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2013_Spain.csv")
fire2012 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2012_Spain.csv")
fire2011 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2011_Spain.csv")
fire2010 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2010_Spain.csv")
fire2009 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2009_Spain.csv")
fire2008 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2008_Spain.csv")
fire2007 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2007_Spain.csv")
fire2006 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2006_Spain.csv")
fire2005 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2005_Spain.csv")
fire2004 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2004_Spain.csv")
fire2003 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2003_Spain.csv")
fire2002 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2002_Spain.csv")
fire2001 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2001_Spain.csv")
fire2000 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_incidents_galicia\modis_2000_Spain.csv")
fire2016.head(5)
| latitude | longitude | brightness | scan | track | acq_date | acq_time | satellite | instrument | confidence | version | bright_t31 | frp | daynight | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 40.4253 | -1.4254 | 305.4 | 1.5 | 1.2 | 2016-01-05 | 1125 | Terra | MODIS | 61 | 6.2 | 277.4 | 18.2 | D | 0 |
| 1 | 37.5847 | -5.8172 | 302.1 | 1.1 | 1.0 | 2016-01-05 | 1126 | Terra | MODIS | 48 | 6.2 | 284.7 | 6.9 | D | 0 |
| 2 | 38.7263 | -0.7202 | 301.5 | 1.1 | 1.0 | 2016-01-05 | 1304 | Aqua | MODIS | 32 | 6.2 | 286.5 | 5.9 | D | 0 |
| 3 | 38.7225 | -0.7440 | 300.5 | 1.1 | 1.0 | 2016-01-05 | 1304 | Aqua | MODIS | 22 | 6.2 | 285.9 | 5.2 | D | 0 |
| 4 | 38.7153 | -0.7298 | 326.4 | 1.1 | 1.0 | 2016-01-05 | 1304 | Aqua | MODIS | 80 | 6.2 | 286.8 | 27.7 | D | 0 |
fire2016.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3211 entries, 0 to 3210 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 latitude 3211 non-null float64 1 longitude 3211 non-null float64 2 brightness 3211 non-null float64 3 scan 3211 non-null float64 4 track 3211 non-null float64 5 acq_date 3211 non-null object 6 acq_time 3211 non-null int64 7 satellite 3211 non-null object 8 instrument 3211 non-null object 9 confidence 3211 non-null int64 10 version 3211 non-null float64 11 bright_t31 3211 non-null float64 12 frp 3211 non-null float64 13 daynight 3211 non-null object 14 type 3211 non-null int64 dtypes: float64(8), int64(3), object(4) memory usage: 376.4+ KB
In this parts below we merged and filtered our data for all the years and for Galicia region coordinates. After that we parsed our data and added some time period indicating columns for the month of year, season, week of a year, day of a year, hour of a day, day of a week etc. Then we dropped missing values and unnecessary columns.
merged_forest_fire_incidents_galicia_2000_2022=pd.concat([fire2000,fire2001,fire2002,fire2003,fire2004,fire2005,fire2006
,fire2007,fire2008,fire2009,fire2010,fire2011,fire2012,fire2013
,fire2014,fire2015,fire2016,fire2017,fire2018,fire2019,fire2020
,fire2021,fire2022], axis=0)
merged_forest_fire_incidents_galicia_2000_2022.reset_index(drop=True, inplace=True)
merged_forest_fire_incidents_galicia_2000_2022.head(5)
| latitude | longitude | brightness | scan | track | acq_date | acq_time | satellite | instrument | confidence | version | bright_t31 | frp | daynight | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 43.5249 | -5.7303 | 301.1 | 1.0 | 1.0 | 2000-11-01 | 1131 | Terra | MODIS | 45 | 6.2 | 269.8 | 7.8 | D | 2 |
| 1 | 41.5184 | -2.0833 | 312.4 | 1.1 | 1.1 | 2000-11-01 | 1132 | Terra | MODIS | 55 | 6.2 | 280.1 | 15.8 | D | 0 |
| 2 | 41.3399 | -2.6720 | 309.7 | 1.1 | 1.0 | 2000-11-01 | 1132 | Terra | MODIS | 0 | 6.2 | 274.0 | 12.6 | D | 0 |
| 3 | 40.2732 | -3.1756 | 319.2 | 1.1 | 1.0 | 2000-11-01 | 1132 | Terra | MODIS | 79 | 6.2 | 288.3 | 19.9 | D | 0 |
| 4 | 40.2479 | -3.4714 | 304.2 | 1.1 | 1.0 | 2000-11-01 | 1132 | Terra | MODIS | 58 | 6.2 | 285.4 | 6.1 | D | 0 |
min_longitude, max_longitude = -9.30, -6.73
min_latitude, max_latitude = 41.8, 43.8
filtered_galicia_fires_00_22 = merged_forest_fire_incidents_galicia_2000_2022[
(merged_forest_fire_incidents_galicia_2000_2022['longitude'] >= min_longitude) &
(merged_forest_fire_incidents_galicia_2000_2022['longitude'] <= max_longitude) &
(merged_forest_fire_incidents_galicia_2000_2022['latitude'] >= min_latitude) &
(merged_forest_fire_incidents_galicia_2000_2022['latitude'] <= max_latitude)
]
filtered_galicia_fires_00_22.head(5)
| latitude | longitude | brightness | scan | track | acq_date | acq_time | satellite | instrument | confidence | version | bright_t31 | frp | daynight | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 172 | 42.5118 | -8.4374 | 300.4 | 1.1 | 1.0 | 2001-02-17 | 1154 | Terra | MODIS | 36 | 6.2 | 286.7 | 5.6 | D | 0 |
| 177 | 42.2953 | -8.2946 | 305.0 | 1.0 | 1.0 | 2001-02-19 | 1142 | Terra | MODIS | 60 | 6.2 | 283.9 | 8.7 | D | 0 |
| 178 | 42.2688 | -8.2864 | 311.8 | 1.0 | 1.0 | 2001-02-19 | 2248 | Terra | MODIS | 83 | 6.2 | 275.9 | 16.2 | N | 0 |
| 186 | 42.2428 | -6.8630 | 314.8 | 1.1 | 1.0 | 2001-02-21 | 1130 | Terra | MODIS | 71 | 6.2 | 279.2 | 15.2 | D | 0 |
| 187 | 42.2881 | -8.3451 | 317.1 | 1.2 | 1.1 | 2001-02-21 | 1130 | Terra | MODIS | 77 | 6.2 | 288.3 | 20.2 | D | 0 |
filtered_galicia_fires_00_22 = filtered_galicia_fires_00_22.drop(['satellite', 'instrument','version'], axis=1)
filtered_galicia_fires_00_22.head(5)
| latitude | longitude | brightness | scan | track | acq_date | acq_time | confidence | bright_t31 | frp | daynight | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 172 | 42.5118 | -8.4374 | 300.4 | 1.1 | 1.0 | 2001-02-17 | 1154 | 36 | 286.7 | 5.6 | D | 0 |
| 177 | 42.2953 | -8.2946 | 305.0 | 1.0 | 1.0 | 2001-02-19 | 1142 | 60 | 283.9 | 8.7 | D | 0 |
| 178 | 42.2688 | -8.2864 | 311.8 | 1.0 | 1.0 | 2001-02-19 | 2248 | 83 | 275.9 | 16.2 | N | 0 |
| 186 | 42.2428 | -6.8630 | 314.8 | 1.1 | 1.0 | 2001-02-21 | 1130 | 71 | 279.2 | 15.2 | D | 0 |
| 187 | 42.2881 | -8.3451 | 317.1 | 1.2 | 1.1 | 2001-02-21 | 1130 | 77 | 288.3 | 20.2 | D | 0 |
Merged and filtered fire data for galicia coordinates
filtered_galicia_fires_00_22 = filtered_galicia_fires_00_22.dropna()
filtered_galicia_fires_00_22.head(5)
| latitude | longitude | brightness | scan | track | acq_date | acq_time | confidence | bright_t31 | frp | daynight | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 172 | 42.5118 | -8.4374 | 300.4 | 1.1 | 1.0 | 2001-02-17 | 1154 | 36 | 286.7 | 5.6 | D | 0 |
| 177 | 42.2953 | -8.2946 | 305.0 | 1.0 | 1.0 | 2001-02-19 | 1142 | 60 | 283.9 | 8.7 | D | 0 |
| 178 | 42.2688 | -8.2864 | 311.8 | 1.0 | 1.0 | 2001-02-19 | 2248 | 83 | 275.9 | 16.2 | N | 0 |
| 186 | 42.2428 | -6.8630 | 314.8 | 1.1 | 1.0 | 2001-02-21 | 1130 | 71 | 279.2 | 15.2 | D | 0 |
| 187 | 42.2881 | -8.3451 | 317.1 | 1.2 | 1.1 | 2001-02-21 | 1130 | 77 | 288.3 | 20.2 | D | 0 |
filtered_galicia_fires_00_22.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 22277 entries, 172 to 100087 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 latitude 22277 non-null float64 1 longitude 22277 non-null float64 2 brightness 22277 non-null float64 3 scan 22277 non-null float64 4 track 22277 non-null float64 5 acq_date 22277 non-null object 6 acq_time 22277 non-null int64 7 confidence 22277 non-null int64 8 bright_t31 22277 non-null float64 9 frp 22277 non-null float64 10 daynight 22277 non-null object 11 type 22277 non-null int64 dtypes: float64(7), int64(3), object(2) memory usage: 2.2+ MB
Parsing of weather and fire dataset to add column indicate month year day of week etc.
galicia_weather['time'] = pd.to_datetime(galicia_weather['time'], format='%Y-%m-%dT%H:%M')
galicia_weather['year'] = galicia_weather['time'].dt.year
# Defining a function to assign seasons
def get_season(month):
if month in [12, 1, 2]:
return 1 # Winter
elif month in [3, 4, 5]:
return 2 # Spring
elif month in [6, 7, 8]:
return 3 # Summer
else:
return 4 # Autumn
# Applying the function to the data
galicia_weather['season'] = galicia_weather['time'].dt.month.apply(get_season)
# Extracting the month
galicia_weather['month'] = galicia_weather['time'].dt.month
# Extracting the week of the year
galicia_weather['week'] = galicia_weather['time'].dt.isocalendar().week
# Extracting the day of the week (1 = Monday, 7 = Sunday)
galicia_weather['day_of_week'] = galicia_weather['time'].dt.dayofweek + 1
# Extracting the hour (24-hour format)
galicia_weather['hour'] = galicia_weather['time'].dt.hour + 1
# Extracting the day of the month
galicia_weather['day_of_month'] = galicia_weather['time'].dt.day
# Extracting the day of the year
galicia_weather['day_of_year'] = galicia_weather['time'].dt.dayofyear
galicia_weather.head(5)
| time | temperature_2m (°C) | relative_humidity_2m (%) | dew_point_2m (°C) | precipitation (mm) | pressure_msl (hPa) | surface_pressure (hPa) | cloud_cover (%) | et0_fao_evapotranspiration (mm) | vapour_pressure_deficit (kPa) | ... | direct_normal_irradiance_instant (W/m²) | terrestrial_radiation_instant (W/m²) | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000-01-01 00:00:00 | 1.9 | 83 | -0.7 | 0.0 | 1029.0 | 963.6 | 29 | 0.0 | 0.12 | ... | 0.0 | 0.0 | 2000 | 1 | 1 | 52 | 6 | 1 | 1 | 1 |
| 1 | 2000-01-01 01:00:00 | 1.3 | 85 | -0.9 | 0.0 | 1029.2 | 963.6 | 24 | 0.0 | 0.10 | ... | 0.0 | 0.0 | 2000 | 1 | 1 | 52 | 6 | 2 | 1 | 1 |
| 2 | 2000-01-01 02:00:00 | -0.1 | 89 | -1.7 | 0.0 | 1029.3 | 963.4 | 24 | 0.0 | 0.07 | ... | 0.0 | 0.0 | 2000 | 1 | 1 | 52 | 6 | 3 | 1 | 1 |
| 3 | 2000-01-01 03:00:00 | -1.7 | 92 | -2.9 | 0.0 | 1028.8 | 962.6 | 16 | 0.0 | 0.05 | ... | 0.0 | 0.0 | 2000 | 1 | 1 | 52 | 6 | 4 | 1 | 1 |
| 4 | 2000-01-01 04:00:00 | -2.2 | 92 | -3.3 | 0.0 | 1028.7 | 962.4 | 8 | 0.0 | 0.04 | ... | 0.0 | 0.0 | 2000 | 1 | 1 | 52 | 6 | 5 | 1 | 1 |
5 rows × 35 columns
galicia_weather.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 213288 entries, 0 to 213287 Data columns (total 35 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 213288 non-null datetime64[ns] 1 temperature_2m (°C) 213288 non-null float64 2 relative_humidity_2m (%) 213288 non-null int64 3 dew_point_2m (°C) 213288 non-null float64 4 precipitation (mm) 213288 non-null float64 5 pressure_msl (hPa) 213288 non-null float64 6 surface_pressure (hPa) 213288 non-null float64 7 cloud_cover (%) 213288 non-null int64 8 et0_fao_evapotranspiration (mm) 213288 non-null float64 9 vapour_pressure_deficit (kPa) 213288 non-null float64 10 wind_speed_10m (km/h) 213288 non-null float64 11 wind_gusts_10m (km/h) 213288 non-null float64 12 soil_temperature_0_to_7cm (°C) 213288 non-null float64 13 soil_temperature_7_to_28cm (°C) 213288 non-null float64 14 soil_temperature_28_to_100cm (°C) 213288 non-null float64 15 soil_temperature_100_to_255cm (°C) 213288 non-null float64 16 soil_moisture_0_to_7cm (m³/m³) 213288 non-null float64 17 soil_moisture_7_to_28cm (m³/m³) 213288 non-null float64 18 soil_moisture_28_to_100cm (m³/m³) 213288 non-null float64 19 soil_moisture_100_to_255cm (m³/m³) 213288 non-null float64 20 is_day () 213288 non-null int64 21 sunshine_duration (s) 213288 non-null float64 22 shortwave_radiation_instant (W/m²) 213288 non-null float64 23 direct_radiation_instant (W/m²) 213288 non-null float64 24 diffuse_radiation_instant (W/m²) 213288 non-null float64 25 direct_normal_irradiance_instant (W/m²) 213288 non-null float64 26 terrestrial_radiation_instant (W/m²) 213288 non-null float64 27 year 213288 non-null int64 28 season 213288 non-null int64 29 month 213288 non-null int64 30 week 213288 non-null UInt32 31 day_of_week 213288 non-null int64 32 hour 213288 non-null int64 33 day_of_month 213288 non-null int64 34 day_of_year 213288 non-null int64 dtypes: UInt32(1), datetime64[ns](1), float64(23), int64(10) memory usage: 56.3 MB
# Converting 'acq_date' to a datetime object
filtered_galicia_fires_00_22['acq_date'] = pd.to_datetime(filtered_galicia_fires_00_22['acq_date'], format='%Y-%m-%d')
# Converting 'acq_time' to hh:mm format and then to a time object
filtered_galicia_fires_00_22['acq_time'] = filtered_galicia_fires_00_22['acq_time'].apply(lambda x: pd.to_datetime(x, format='%H%M').time())
# Combining 'acq_date' and 'acq_time' into a single datetime column
filtered_galicia_fires_00_22['datetime'] = filtered_galicia_fires_00_22.apply(lambda row: pd.Timestamp.combine(row['acq_date'], row['acq_time']), axis=1)
# Extracting the year
filtered_galicia_fires_00_22['year'] = filtered_galicia_fires_00_22['datetime'].dt.year
# Defining a function to assign seasons
def get_season(month):
if month in [12, 1, 2]:
return 1 # Winter
elif month in [3, 4, 5]:
return 2 # Spring
elif month in [6, 7, 8]:
return 3 # Summer
else:
return 4 # Autumn
# Applying the function to the DataFrame
filtered_galicia_fires_00_22['season'] = filtered_galicia_fires_00_22['datetime'].dt.month.apply(get_season)
# Extracting the month
filtered_galicia_fires_00_22['month'] = filtered_galicia_fires_00_22['datetime'].dt.month
# Extracting the week of the year
filtered_galicia_fires_00_22['week'] = filtered_galicia_fires_00_22['datetime'].dt.isocalendar().week
# Extracting the day of the week (1 = Monday, 7 = Sunday)
filtered_galicia_fires_00_22['day_of_week'] = filtered_galicia_fires_00_22['datetime'].dt.dayofweek + 1
# Extracting the hour (24-hour format)
filtered_galicia_fires_00_22['hour'] = filtered_galicia_fires_00_22['datetime'].dt.hour + 1
# Extracting the day of the month
filtered_galicia_fires_00_22['day_of_month'] = filtered_galicia_fires_00_22['datetime'].dt.day
# Extracting the day of the year
filtered_galicia_fires_00_22['day_of_year'] = filtered_galicia_fires_00_22['datetime'].dt.dayofyear
filtered_galicia_fires_00_22.head(5)
| latitude | longitude | brightness | scan | track | acq_date | acq_time | confidence | bright_t31 | frp | ... | type | datetime | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 172 | 42.5118 | -8.4374 | 300.4 | 1.1 | 1.0 | 2001-02-17 | 11:54:00 | 36 | 286.7 | 5.6 | ... | 0 | 2001-02-17 11:54:00 | 2001 | 1 | 2 | 7 | 6 | 12 | 17 | 48 |
| 177 | 42.2953 | -8.2946 | 305.0 | 1.0 | 1.0 | 2001-02-19 | 11:42:00 | 60 | 283.9 | 8.7 | ... | 0 | 2001-02-19 11:42:00 | 2001 | 1 | 2 | 8 | 1 | 12 | 19 | 50 |
| 178 | 42.2688 | -8.2864 | 311.8 | 1.0 | 1.0 | 2001-02-19 | 22:48:00 | 83 | 275.9 | 16.2 | ... | 0 | 2001-02-19 22:48:00 | 2001 | 1 | 2 | 8 | 1 | 23 | 19 | 50 |
| 186 | 42.2428 | -6.8630 | 314.8 | 1.1 | 1.0 | 2001-02-21 | 11:30:00 | 71 | 279.2 | 15.2 | ... | 0 | 2001-02-21 11:30:00 | 2001 | 1 | 2 | 8 | 3 | 12 | 21 | 52 |
| 187 | 42.2881 | -8.3451 | 317.1 | 1.2 | 1.1 | 2001-02-21 | 11:30:00 | 77 | 288.3 | 20.2 | ... | 0 | 2001-02-21 11:30:00 | 2001 | 1 | 2 | 8 | 3 | 12 | 21 | 52 |
5 rows × 21 columns
filtered_galicia_fires_00_22.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 22277 entries, 172 to 100087 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 latitude 22277 non-null float64 1 longitude 22277 non-null float64 2 brightness 22277 non-null float64 3 scan 22277 non-null float64 4 track 22277 non-null float64 5 acq_date 22277 non-null datetime64[ns] 6 acq_time 22277 non-null object 7 confidence 22277 non-null int64 8 bright_t31 22277 non-null float64 9 frp 22277 non-null float64 10 daynight 22277 non-null object 11 type 22277 non-null int64 12 datetime 22277 non-null datetime64[ns] 13 year 22277 non-null int64 14 season 22277 non-null int64 15 month 22277 non-null int64 16 week 22277 non-null UInt32 17 day_of_week 22277 non-null int64 18 hour 22277 non-null int64 19 day_of_month 22277 non-null int64 20 day_of_year 22277 non-null int64 dtypes: UInt32(1), datetime64[ns](2), float64(7), int64(9), object(2) memory usage: 3.7+ MB
frp_firedata01_22 = filtered_galicia_fires_00_22.drop(['brightness', 'scan','track','daynight', 'type','bright_t31'], axis=1)
frp_firedata01_22.head(5)
| latitude | longitude | acq_date | acq_time | confidence | frp | datetime | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 172 | 42.5118 | -8.4374 | 2001-02-17 | 11:54:00 | 36 | 5.6 | 2001-02-17 11:54:00 | 2001 | 1 | 2 | 7 | 6 | 12 | 17 | 48 |
| 177 | 42.2953 | -8.2946 | 2001-02-19 | 11:42:00 | 60 | 8.7 | 2001-02-19 11:42:00 | 2001 | 1 | 2 | 8 | 1 | 12 | 19 | 50 |
| 178 | 42.2688 | -8.2864 | 2001-02-19 | 22:48:00 | 83 | 16.2 | 2001-02-19 22:48:00 | 2001 | 1 | 2 | 8 | 1 | 23 | 19 | 50 |
| 186 | 42.2428 | -6.8630 | 2001-02-21 | 11:30:00 | 71 | 15.2 | 2001-02-21 11:30:00 | 2001 | 1 | 2 | 8 | 3 | 12 | 21 | 52 |
| 187 | 42.2881 | -8.3451 | 2001-02-21 | 11:30:00 | 77 | 20.2 | 2001-02-21 11:30:00 | 2001 | 1 | 2 | 8 | 3 | 12 | 21 | 52 |
frp_firedata01_22.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 22277 entries, 172 to 100087 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 latitude 22277 non-null float64 1 longitude 22277 non-null float64 2 acq_date 22277 non-null datetime64[ns] 3 acq_time 22277 non-null object 4 confidence 22277 non-null int64 5 frp 22277 non-null float64 6 datetime 22277 non-null datetime64[ns] 7 year 22277 non-null int64 8 season 22277 non-null int64 9 month 22277 non-null int64 10 week 22277 non-null UInt32 11 day_of_week 22277 non-null int64 12 hour 22277 non-null int64 13 day_of_month 22277 non-null int64 14 day_of_year 22277 non-null int64 dtypes: UInt32(1), datetime64[ns](2), float64(3), int64(8), object(1) memory usage: 2.7+ MB
Total pollutant green house gases released dataset btw 2002-2023
galicia_total_pollutant = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\emission_gfed_full_2002_2023.csv")
galicia_total_pollutant = galicia_total_pollutant.query("country == 'Spain' and region == 'Galicia'")
galicia_total_pollutant = galicia_total_pollutant.drop(['gid_0', 'country','gid_1'], axis=1)
galicia_total_pollutant.head(5)
| year | month | region | CO2 | CO | TPM | PM25 | TPC | NMHC | OC | CH4 | SO2 | BC | NOx | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 858 | 2002 | 1 | Galicia | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 4468 | 2002 | 2 | Galicia | 22535.052 | 1202.291 | 228.750 | 165.325 | 125.476 | 112.970 | 118.445 | 47.629 | 13.854 | 7.024 | 29.088 |
| 8078 | 2002 | 3 | Galicia | 34622.885 | 1515.237 | 238.283 | 183.307 | 106.545 | 110.338 | 97.419 | 54.264 | 13.718 | 8.995 | 67.585 |
| 11688 | 2002 | 4 | Galicia | 80636.228 | 3897.548 | 672.776 | 498.031 | 335.511 | 326.068 | 312.258 | 148.580 | 39.561 | 23.075 | 133.829 |
| 15298 | 2002 | 5 | Galicia | 1879.408 | 70.227 | 9.475 | 7.992 | 3.344 | 3.790 | 2.921 | 2.163 | 0.535 | 0.412 | 4.347 |
galicia_total_pollutant.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 264 entries, 858 to 950288 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 264 non-null int64 1 month 264 non-null int64 2 region 264 non-null object 3 CO2 264 non-null float64 4 CO 264 non-null float64 5 TPM 264 non-null float64 6 PM25 264 non-null float64 7 TPC 264 non-null float64 8 NMHC 264 non-null float64 9 OC 264 non-null float64 10 CH4 264 non-null float64 11 SO2 264 non-null float64 12 BC 264 non-null float64 13 NOx 264 non-null float64 dtypes: float64(11), int64(2), object(1) memory usage: 30.9+ KB
Fire data shows burnt area dataset btw 2003-2018 and its parsing
galicia_burned_area_byfires_03_18 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\galicia_burned_area_bywildfires_03_18.csv")
galicia_burned_area_byfires_03_18 = galicia_burned_area_byfires_03_18.drop(['numeroparte', 'idcomunidad'], axis=1)
galicia_burned_area_byfires_03_18.head(5)
| deteccion | idprovincia | burnt_area | latitude | longitude | |
|---|---|---|---|---|---|
| 0 | 2003-01-15 18:30:00 | 15 | 0.50 | 43.501195 | -8.012159 |
| 1 | 2003-01-16 20:10:00 | 15 | 1.50 | 43.501195 | -8.012159 |
| 2 | 2003-01-17 08:50:00 | 15 | 2.05 | 42.988479 | -9.238336 |
| 3 | 2003-01-28 21:40:00 | 15 | 0.35 | 42.709977 | -8.787082 |
| 4 | 2003-02-13 13:55:00 | 15 | 0.01 | 43.520902 | -8.189201 |
# Converting 'deteccion' to datetime object
galicia_burned_area_byfires_03_18['deteccion'] = pd.to_datetime(galicia_burned_area_byfires_03_18['deteccion'])
# Extracting the year
galicia_burned_area_byfires_03_18['year'] = galicia_burned_area_byfires_03_18['deteccion'].dt.year
# Defining a function to assign seasons
def get_season(month):
if month in [12, 1, 2]:
return 1 # Winter
elif month in [3, 4, 5]:
return 2 # Spring
elif month in [6, 7, 8]:
return 3 # Summer
else:
return 4 # Autumn
# Applying the function to the DataFrame
galicia_burned_area_byfires_03_18['season'] = galicia_burned_area_byfires_03_18['deteccion'].dt.month.apply(get_season)
# Extracting the month
galicia_burned_area_byfires_03_18['month'] = galicia_burned_area_byfires_03_18['deteccion'].dt.month
# Extracting the week of the year
galicia_burned_area_byfires_03_18['week'] = galicia_burned_area_byfires_03_18['deteccion'].dt.isocalendar().week
# Extracting the day of the week (1 = Monday, 7 = Sunday)
galicia_burned_area_byfires_03_18['day_of_week'] = galicia_burned_area_byfires_03_18['deteccion'].dt.dayofweek + 1
# Extracting the hour (24-hour format)
galicia_burned_area_byfires_03_18['hour'] = galicia_burned_area_byfires_03_18['deteccion'].dt.hour + 1
# Extracting the day of the month
galicia_burned_area_byfires_03_18['day_of_month'] = galicia_burned_area_byfires_03_18['deteccion'].dt.day
# Extracting the day of the year
galicia_burned_area_byfires_03_18['day_of_year'] = galicia_burned_area_byfires_03_18['deteccion'].dt.dayofyear
galicia_burned_area_byfires_03_18.head(5)
| deteccion | idprovincia | burnt_area | latitude | longitude | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2003-01-15 18:30:00 | 15 | 0.50 | 43.501195 | -8.012159 | 2003 | 1 | 1 | 3 | 3 | 19 | 15 | 15 |
| 1 | 2003-01-16 20:10:00 | 15 | 1.50 | 43.501195 | -8.012159 | 2003 | 1 | 1 | 3 | 4 | 21 | 16 | 16 |
| 2 | 2003-01-17 08:50:00 | 15 | 2.05 | 42.988479 | -9.238336 | 2003 | 1 | 1 | 3 | 5 | 9 | 17 | 17 |
| 3 | 2003-01-28 21:40:00 | 15 | 0.35 | 42.709977 | -8.787082 | 2003 | 1 | 1 | 5 | 2 | 22 | 28 | 28 |
| 4 | 2003-02-13 13:55:00 | 15 | 0.01 | 43.520902 | -8.189201 | 2003 | 1 | 2 | 7 | 4 | 14 | 13 | 44 |
Why we choose galicia dataset for line and bar plot regions of spain
spain_avgburnedarea_avgfires_byregion_02_23 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\Avg. Burned Area (ha) divided by Region Area (Km2) and Avg. Nr. of Fires Region Area (Km2) - [2002-2023].csv")
spain_avgburnedarea_avgfires_byregion_02_23.head(5)
| Region | Burned Area | Nr. of Fires | |
|---|---|---|---|
| 0 | Andalucía | 0.238 | 0.001 |
| 1 | Aragón | 0.084 | 0.000 |
| 2 | Cantabria | 0.237 | 0.002 |
| 3 | Castilla y León | 0.220 | 0.001 |
| 4 | Castilla-La Mancha | 0.078 | 0.000 |
spain_avgburnedarea_avgfires_byregion_02_23.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 18 entries, 0 to 17 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Region 18 non-null object 1 Burned Area 18 non-null float64 2 Nr. of Fires 18 non-null float64 dtypes: float64(2), object(1) memory usage: 560.0+ bytes
data for showing how much of forest loss due to fires
spain_yearly_treecoverloss_byfires_01_23 = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\Spain_treecoverloss_yearly_01_23.csv")
spain_yearly_treecoverloss_byfires_01_23 = spain_yearly_treecoverloss_byfires_01_23.drop(['iso', 'adm1'], axis=1)
spain_yearly_treecoverloss_byfires_01_23.head(5)
| umd_tree_cover_loss__year | umd_tree_cover_loss__ha | umd_tree_cover_loss_from_fires__ha | |
|---|---|---|---|
| 0 | 2001 | 8700.494893 | 1039.163056 |
| 1 | 2002 | 10416.597912 | 2271.879901 |
| 2 | 2003 | 4315.377146 | 504.531020 |
| 3 | 2004 | 15337.191094 | 3345.148959 |
| 4 | 2005 | 10222.512235 | 1925.369063 |
spain_yearly_treecoverloss_byfires_01_23.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 23 entries, 0 to 22 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 umd_tree_cover_loss__year 23 non-null int64 1 umd_tree_cover_loss__ha 23 non-null float64 2 umd_tree_cover_loss_from_fires__ha 23 non-null float64 dtypes: float64(2), int64(1) memory usage: 680.0 bytes
irre_data = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\irreplacibility.csv")
fire_data = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\fire_risk.csv")
firealerts_subregions = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\firealerts_subregion_galicia.csv")
treecover_subregions = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\treecover_subregion_galicia.csv")
selected_features = ['temperature_2m (°C)',
'relative_humidity_2m (%)',
'et0_fao_evapotranspiration (mm)',
'vapour_pressure_deficit (kPa)',
'wind_speed_10m (km/h)',
'soil_temperature_0_to_7cm (°C)',
'soil_moisture_0_to_7cm (m³/m³)',
'direct_normal_irradiance_instant (W/m²)']
galicia_weather[selected_features].describe().round(2)
| temperature_2m (°C) | relative_humidity_2m (%) | et0_fao_evapotranspiration (mm) | vapour_pressure_deficit (kPa) | wind_speed_10m (km/h) | soil_temperature_0_to_7cm (°C) | soil_moisture_0_to_7cm (m³/m³) | direct_normal_irradiance_instant (W/m²) | |
|---|---|---|---|---|---|---|---|---|
| count | 213288.00 | 213288.00 | 213288.00 | 213288.00 | 213288.00 | 213288.00 | 213288.00 | 213288.00 |
| mean | 11.73 | 82.25 | 0.10 | 0.32 | 12.27 | 12.57 | 0.31 | 185.55 |
| std | 6.06 | 14.77 | 0.15 | 0.42 | 6.68 | 6.18 | 0.10 | 283.67 |
| min | -5.80 | 19.00 | 0.00 | 0.00 | 0.00 | -2.40 | 0.09 | 0.00 |
| 25% | 7.50 | 73.00 | 0.00 | 0.07 | 7.10 | 8.00 | 0.23 | 0.00 |
| 50% | 11.30 | 87.00 | 0.02 | 0.16 | 11.00 | 11.90 | 0.34 | 0.00 |
| 75% | 15.50 | 94.00 | 0.14 | 0.41 | 16.50 | 16.70 | 0.39 | 317.80 |
| max | 36.30 | 100.00 | 0.80 | 4.23 | 55.10 | 34.40 | 0.44 | 983.90 |
filtered_galicia_fires_00_22.describe().round(2)
| latitude | longitude | brightness | scan | track | confidence | bright_t31 | frp | type | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 | 22277.00 |
| mean | 42.46 | -7.81 | 325.26 | 1.73 | 1.25 | 73.45 | 293.05 | 66.26 | 0.01 | 2009.29 | 2.94 | 7.26 | 29.55 | 4.06 | 16.20 | 14.62 | 204.34 |
| std | 0.40 | 0.71 | 22.83 | 0.89 | 0.27 | 23.24 | 10.20 | 125.29 | 0.12 | 5.97 | 0.85 | 2.48 | 10.57 | 2.10 | 5.73 | 7.74 | 74.30 |
| min | 41.81 | -9.27 | 300.00 | 1.00 | 1.00 | 0.00 | 265.10 | 0.00 | 0.00 | 2001.00 | 1.00 | 1.00 | 1.00 | 1.00 | 3.00 | 1.00 | 1.00 |
| 25% | 42.14 | -8.45 | 309.70 | 1.10 | 1.00 | 59.00 | 286.40 | 14.80 | 0.00 | 2005.00 | 3.00 | 7.00 | 27.00 | 2.00 | 12.00 | 8.00 | 188.00 |
| 50% | 42.40 | -7.78 | 319.20 | 1.40 | 1.20 | 77.00 | 292.20 | 30.20 | 0.00 | 2006.00 | 3.00 | 8.00 | 32.00 | 4.00 | 14.00 | 14.00 | 221.00 |
| 75% | 42.74 | -7.18 | 333.70 | 2.10 | 1.40 | 95.00 | 299.90 | 66.90 | 0.00 | 2013.00 | 4.00 | 9.00 | 36.00 | 6.00 | 23.00 | 20.00 | 248.00 |
| max | 43.73 | -6.73 | 505.40 | 4.80 | 2.00 | 100.00 | 400.10 | 2956.20 | 3.00 | 2022.00 | 4.00 | 12.00 | 53.00 | 7.00 | 24.00 | 31.00 | 366.00 |
galicia_burned_area_byfires_03_18.describe().round(2)
| idprovincia | burnt_area | latitude | longitude | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 72757.00 | 72757.00 | 72575.00 | 72575.00 | 72757.00 | 72757.00 | 72757.00 | 72757.00 | 72757.00 | 72757.00 | 72757.00 | 72757.00 |
| mean | 28.26 | 5.09 | 42.54 | -8.07 | 2007.73 | 2.79 | 6.53 | 26.60 | 4.14 | 15.98 | 15.65 | 183.25 |
| std | 8.02 | 60.08 | 0.53 | 0.71 | 4.00 | 0.92 | 2.65 | 11.46 | 2.02 | 6.14 | 8.35 | 80.36 |
| min | 15.00 | 0.00 | 4.67 | -9.43 | 2003.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 25% | 27.00 | 0.05 | 42.17 | -8.57 | 2004.00 | 2.00 | 4.00 | 15.00 | 2.00 | 14.00 | 9.00 | 104.00 |
| 50% | 32.00 | 0.23 | 42.44 | -8.10 | 2006.00 | 3.00 | 7.00 | 30.00 | 4.00 | 17.00 | 16.00 | 207.00 |
| 75% | 36.00 | 1.00 | 42.88 | -7.60 | 2011.00 | 3.00 | 8.00 | 35.00 | 6.00 | 20.00 | 22.00 | 243.00 |
| max | 36.00 | 7352.14 | 78.64 | 47.51 | 2018.00 | 4.00 | 12.00 | 53.00 | 7.00 | 24.00 | 31.00 | 366.00 |
galicia_total_pollutant.describe().round(2)
| year | month | CO2 | CO | TPM | PM25 | TPC | NMHC | OC | CH4 | SO2 | BC | NOx | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 | 264.00 |
| mean | 2012.50 | 6.50 | 51182.32 | 2417.84 | 411.11 | 307.18 | 201.81 | 196.74 | 187.38 | 90.63 | 24.15 | 14.29 | 88.06 |
| std | 6.36 | 3.46 | 249922.85 | 11604.11 | 1980.41 | 1495.35 | 979.10 | 929.36 | 910.65 | 425.13 | 117.54 | 68.02 | 435.89 |
| min | 2002.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 25% | 2007.00 | 3.75 | 370.15 | 17.43 | 2.35 | 1.98 | 1.04 | 1.20 | 0.98 | 0.61 | 0.13 | 0.10 | 0.58 |
| 50% | 2012.50 | 6.50 | 6157.90 | 314.05 | 50.55 | 39.04 | 22.74 | 24.87 | 20.41 | 12.09 | 2.85 | 1.82 | 10.54 |
| 75% | 2018.00 | 9.25 | 20546.72 | 981.43 | 177.45 | 127.99 | 93.03 | 89.07 | 87.47 | 39.37 | 10.24 | 5.87 | 36.76 |
| max | 2023.00 | 12.00 | 3546581.06 | 163093.14 | 27643.65 | 20972.60 | 13548.09 | 12881.31 | 12582.94 | 5923.37 | 1640.21 | 954.82 | 6250.45 |
Data to csv export use it for your convenience
#datafilename.to_csv('data.csv', index=False)
2 - Data Visualizations and EDA¶
# Setting up the plot
fig, ax1 = plt.subplots(figsize=(14, 6))
# Plotting the bar chart
ax1.bar(spain_avgburnedarea_avgfires_byregion_02_23['Region'], spain_avgburnedarea_avgfires_byregion_02_23['Burned Area'], color='maroon', alpha=0.7, label='Burned Area')
ax1.set_xlabel('Regions',fontsize=13)
ax1.set_ylabel('Burned Area',fontsize=15)
ax1.set_xticklabels(spain_avgburnedarea_avgfires_byregion_02_23['Region'], rotation=45, ha='right')
# Creating a second y-axis to plot the line chart
ax2 = ax1.twinx()
ax2.plot(spain_avgburnedarea_avgfires_byregion_02_23['Region'], spain_avgburnedarea_avgfires_byregion_02_23['Nr. of Fires'], color='orangered', marker='o', label='Nr. of Fires')
ax2.set_ylabel('Nr. of Fires',fontsize=15)
# Adding title and legend
plt.title('Mean Burned Area in ha per $Km^{2}$ Area and Mean Wildfire Incidents per $Km^{2}$ Area by Regions of Spain - [2002-2023]')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
# Showing plot
plt.tight_layout()
plt.show()
C:\Users\45502\AppData\Local\Temp\ipykernel_43264\1673992272.py:8: UserWarning: FixedFormatter should only be used together with FixedLocator ax1.set_xticklabels(spain_avgburnedarea_avgfires_byregion_02_23['Region'], rotation=45, ha='right')
by burned area galicia is most critical region when we consider burnt area due to wildfires happened in spain between 2002-2023. Source:https://gwis.jrc.ec.europa.eu/apps/country.profile/downloads
# Plot
fig, ax = plt.subplots(figsize=(12, 8))
# Plotting the total loss bars
ax.bar(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'], spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__ha'], label='Total Annual Tree Cover Loss for the Related Year', color='darkolivegreen')
# Plotting the loss from fires as an overlay, using the same base x-coordinates
ax.bar(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'], spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss_from_fires__ha'], label='Annual Amount of Tree Cover Loss due to Wildfires happened in the Related Year', color='brown')
# Adding labels, title, and gridlines
ax.set_xlabel('Years', fontsize=14)
ax.set_ylabel('Tree Cover Loss (ha)', fontsize=15)
ax.set_title('Total Annual Tree Cover Loss and Annual Tree Cover Loss due to Wildfires in Galicia, Spain', fontsize=14)
ax.legend()
ax.grid(axis='y', linestyle='--', alpha=0.6)
# Adjusting the x-axis ticks
ax.set_xticks(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'])
ax.set_xticklabels(spain_yearly_treecoverloss_byfires_01_23['umd_tree_cover_loss__year'], rotation=45)
# Increasing the number of y-axis ticks
ax.yaxis.set_major_locator(plt.MaxNLocator(10))
# Display the plot
plt.tight_layout()
plt.show()
Total amount of Tree cover loss and inside of it the amount of tree cover loss due to wildfires are represented like this. Source: https://www.globalforestwatch.org/dashboards/global/
firealerts_subregions.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1012 entries, 0 to 1011 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 subregion 1012 non-null int64 1 alert__count 1012 non-null int64 dtypes: int64(2) memory usage: 15.9 KB
treecover_subregions.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 subregion 4 non-null int64 1 area__ha 4 non-null float64 dtypes: float64(1), int64(1) memory usage: 192.0 bytes
# Mapping of subregion numbers to names
subregion_map = {1: 'A Coruna', 2: 'Lugo', 3: 'Ourense', 4: 'Pontevedra'}
# Aggregating the tree cover by subregion
treecover_agg = treecover_subregions.groupby('subregion')['area__ha'].sum()
# Replacing subregion numbers with names
treecover_agg.index = treecover_agg.index.map(subregion_map)
# Function to format labels with percentage and area
def autopct_format(values):
def my_format(pct):
total = sum(values)
absolute = round(pct / 100 * total)
return f'{pct:.1f}% ({absolute} ha)'
return my_format
# Setting the pastel colors for each subregion
colors = ['#FFB3B3', # Pastel light red for A Coruna
'#B3CFFF', # Pastel light blue for Lugo
'#D9B3FF', # Pastel light purple for Ourense
'#FFFFB3'] # Pastel yellow for Pontevedra
# Creating a pie chart with specified colors
fig, ax = plt.subplots(figsize=(10, 10))
ax.pie(treecover_agg, labels=treecover_agg.index, autopct=autopct_format(treecover_agg),
startangle=140, colors=colors, textprops={'fontsize': 14}, wedgeprops={'edgecolor': 'black'})
ax.set_title('Tree Cover Distribution by Subregions of Galicia, Spain', fontsize=16)
plt.show()
# Aggregating the alert counts by subregion
firealerts_agg = firealerts_subregions.groupby('subregion')['alert__count'].sum()
# Replacing subregion numbers with names
firealerts_agg.index = firealerts_agg.index.map(subregion_map)
# Sorting the data by alert count
firealerts_agg = firealerts_agg.sort_values(ascending=False)
# Creating a color map for each subregion
colors = ['#A52A2A', '#FF8C00', '#FFD700', '#FFE4B5']
color_map = dict(zip(firealerts_agg.index, colors))
# Creating the plot
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.barh(firealerts_agg.index, firealerts_agg, color=[color_map[subregion] for subregion in firealerts_agg.index], edgecolor='black')
# Adding dots at the end of bars
for bar in bars:
ax.plot(bar.get_width(), bar.get_y() + bar.get_height() / 2, 'o', color=bar.get_facecolor(), markersize=12)
# Adding alert counts at the end of the bars
for bar in bars:
ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height() / 2,
f'{int(bar.get_width())}', va='center', fontsize=12)
# Adjusting the plot
ax.set_xlabel('Total Alerts')
ax.set_title('Total Alerts by Subregions of Galicia, Spain')
ax.invert_yaxis()
plt.show()
# Filtering the data based on the criteria
filtered = galicia_burned_area_byfires_03_18[(galicia_burned_area_byfires_03_18['longitude'] < -6.73) &
(galicia_burned_area_byfires_03_18['longitude'] > -9.3) &
(galicia_burned_area_byfires_03_18['latitude'] > 41.8) &
(galicia_burned_area_byfires_03_18['latitude'] < 43.8) &
(galicia_burned_area_byfires_03_18['burnt_area'] > 100)]
# Creating a map centered around Galicia
map_galicia = folium.Map(location=[42.7, -8.015], zoom_start=8)
folium.TileLayer('cartodbdark_matter').add_to(map_galicia)
# Adding title using custom HTML
title_html = '''
<h3 align="center" style="font-size:20px"><b>Historical Wildfires Happened in Galicia Which Burned More Than 100 ha Area</b></h3>
'''
map_galicia.get_root().html.add_child(folium.Element(title_html))
# Defining the color mapping for each 'idprovincia'
color_map = {
15: 'red', # A Coruña
27: 'blue', # Lugo
32: 'purple', # Ourense
36: 'yellow' # Pontevedra
}
# Adding circles for wildfire incidents
for idx, row in filtered.iterrows():
color = color_map.get(row['idprovincia'], 'blue')
tooltip_text = f"Date & Time: {row['deteccion']}<br>Burnt Area: {row['burnt_area']} ha"
folium.Circle(
location=[row['latitude'], row['longitude']],
radius=row['burnt_area'] * 1,
color=color,
fill=True,
fill_color=color,
fill_opacity=0.5,
tooltip=tooltip_text
).add_to(map_galicia)
# Adding custom legend
legend_html = '''
<div style="position: fixed;
bottom: 50px; left: 50px; width: 200px; height: 130px;
border:2px solid grey; z-index:9999; font-size:14px;
background-color: white; opacity: 0.9;
">
<b>Subregions of Galica</b>
<br>
<i style="background: red; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>A Coruña
<br>
<i style="background: blue; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>Lugo
<br>
<i style="background: purple; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>Ourense
<br>
<i style="background: yellow; width: 18px; height: 18px; float: left; margin-right: 8px;"></i>Pontevedra
</div>
'''
map_galicia.get_root().html.add_child(folium.Element(legend_html))
map_galicia.save(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\map_galicia.html')
# Displaying the map
map_galicia
# Creating a map centered on Galicia
heatmap_galicia = folium.Map(location=[42.7, -8.015], zoom_start=8)
# Preparing data for heatmap, normalize FRP by dividing by its maximum value
lat_longs = [[row['latitude'], row['longitude'], row['frp'] / filtered_galicia_fires_00_22['frp'].max()]
for _, row in filtered_galicia_fires_00_22.iterrows()]
# Adding HeatMap to the folium map
HeatMap(
lat_longs,
radius=2.1, # Adjust radius for heatmap circles
blur=2.5, # Adjust blur for smoother heatmap
max_zoom=10000, # Adjust for better visualization
min_opacity=0.80 # Adjust for a clearer distinction
).add_to(heatmap_galicia)
# Adding title using custom HTML
title_html = '''
<h3 align="center" style="font-size:20px"><b>Galician Zones Wildfire Severity Heatmap by Considering the Fire Radiative Power (FRP) in Megawatts of Historical Wildfire Incidents 2001-2022 </b></h3>
'''
heatmap_galicia.get_root().html.add_child(folium.Element(title_html))
# Adding custom legend for the color bar
color_bar_html = '''
<div style="position: fixed;
bottom: 50px; left: 50px; width: 120px; height: 150px;
border:2px solid grey; z-index:9999; font-size:14px;
background-color:white; opacity: 0.85;">
<b>FRP Range</b><br>
<i style="background: #00FF00; width: 20px; height: 20px; float: left; margin-right: 5px;"></i>Low<br>
<i style="background: #FFFF00; width: 20px; height: 20px; float: left; margin-right: 5px;"></i>Medium<br>
<i style="background: #FF0000; width: 20px; height: 20px; float: left; margin-right: 5px;"></i>High<br>
</div>
'''
heatmap_galicia.get_root().html.add_child(folium.Element(color_bar_html))
heatmap_galicia.save(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\heatmap_galicia.html')
# Displaying the map
heatmap_galicia
# Aggregating burnt_area by month
monthly_burnt_area = galicia_total_pollutant.groupby('month')['CO2'].sum().reset_index()
# Creating month names list
month_names = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# Mapping month numbers to names
monthly_burnt_area['month_name'] = monthly_burnt_area['month'].apply(lambda x: month_names[x-1])
# Mapping months to angles in the radial plot
angle_mapping = {1: 0, 2: 30, 3: 60, 4: 90, 5: 120, 6: 150, 7: 180, 8: 210, 9: 240, 10: 270, 11: 300, 12: 330}
monthly_burnt_area['theta'] = monthly_burnt_area['month'].map(angle_mapping)
# Creating radial polar plot
fig = go.Figure()
fig.add_trace(go.Barpolar(
r=monthly_burnt_area['CO2'],
theta=monthly_burnt_area['theta'],
marker_color=[px.colors.sequential.Reds[i % len(px.colors.sequential.Reds)] for i in range(len(monthly_burnt_area))],
marker_line_color='black',
marker_line_width=1,
opacity=0.8
))
# Layout adjustments
fig.update_layout(
title={
'text': 'Monthly Distribution of Total Emitted CO<sub>2</sub> Greenhouse Gas in Metric Tons (tonnes) of Wildfires in Galicia 2002-2023, per kilogram of dry matter burned',
'font': {
'size': 12
}
},
polar=dict(
radialaxis=dict(visible=True, range=[0, monthly_burnt_area['CO2'].max() + 5]),
angularaxis=dict(
tickmode='array',
tickvals=list(angle_mapping.values()),
ticktext=month_names
)
),
template="plotly_dark"
)
fig.write_html(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\polarplot.html')
fig.show()
# Pivoting the data
pivoted_data = pd.pivot_table(galicia_total_pollutant, index='month', values=['CO', 'TPM', 'PM25', 'TPC', 'NMHC', 'OC', 'CH4', 'SO2', 'BC', 'NOx'], aggfunc='sum')
# Creating traces for each pollutant
pollutant_traces = []
pollutants = pivoted_data.columns
colors = px.colors.qualitative.Pastel
# Adding CO separately
co_trace = go.Bar(
y=[month - 0.2 for month in pivoted_data.index],
x=pivoted_data['CO'],
name='CO',
orientation='h',
marker=dict(color='orange'),
hoverinfo='x+y+name',
width=0.4 # Adjust bar width
)
pollutant_traces.append(co_trace)
# Adding other pollutants, excluding CO
stacked_traces = []
for i, pollutant in enumerate([p for p in pollutants if p != 'CO']):
trace = go.Bar(
y=[month + 0.2 for month in pivoted_data.index],
x=pivoted_data[pollutant],
name=pollutant,
orientation='h',
marker=dict(color=colors[i % len(colors)]),
hoverinfo='x+y+name',
width=0.4 # Adjust bar width
)
stacked_traces.append(trace)
# Creating tick intervals at every 25k
max_val = pivoted_data.sum().max()
tick_vals = list(range(0, int(max_val + 25000), 25000))
# Creating the layout with increased height
layout = go.Layout(
title='Monthly Emissions by Other Pollutants released by Wildfire Incidents in Galicia between 2002-2023',
barmode='stack',
xaxis=dict(
title='Total Emissions in Metric Tons (tonnes)',
showgrid=True,
tickvals=tick_vals,
ticktext=[f'{val // 1000}k' for val in tick_vals],
tickfont=dict(size=14)
),
yaxis=dict(
title='Month',
tickvals=list(range(1, 13)),
ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
tickfont=dict(size=14)
),
legend=dict(
title=dict(
text='Pollutants',
font=dict(size=16)
),
x=1.05,
y=1,
font=dict(size=12)
),
template='plotly_white',
height=800
)
# Creating the figure
fig = go.Figure(data=pollutant_traces + stacked_traces, layout=layout)
fig.write_html(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\horizontal_other_polutants.html')
# Showing the interactive plot
fig.show()
# Converting 'datetime' to a date to group by day
filtered_galicia_fires_00_22['acq_date'] = filtered_galicia_fires_00_22['datetime'].dt.date
# Ensuring that 'acq_date' is a proper DateTimeIndex for the grouping to work
filtered_galicia_fires_00_22.set_index('acq_date', inplace=True)
# Grouping by 'acq_date' and count the number of wildfires
wildfire_count = filtered_galicia_fires_00_22.groupby(filtered_galicia_fires_00_22.index).size()
# Converting the index to a DatetimeIndex
wildfire_count.index = pd.to_datetime(wildfire_count.index)
# Creating a custom colormap based on the number of wildfires
cmap_colors = ['lightsalmon','lightsalmon','lightsalmon','lightsalmon','lightsalmon',
'salmon','salmon','salmon','salmon','salmon',
'tomato','tomato','tomato','tomato','tomato',
'red','red','red','red','red',
'crimson','crimson','crimson','crimson','crimson',
'firebrick','firebrick','firebrick','firebrick','firebrick',
'brown','brown','brown','brown','brown',
'darkred','darkred','darkred','darkred','darkred',
'maroon','maroon','maroon','maroon','maroon',
'black','black','black','black','black',]
custom_cmap = ListedColormap(cmap_colors)
# Plotting the calendar plot
calplot.calplot(wildfire_count, cmap=custom_cmap, vmin=0, vmax=len(cmap_colors), edgecolor='white', linewidth=0.5)
# Adding title with y offset
plt.suptitle('Calendar Plot Showing the Frequency of Daily Wildfire Incidents (2001-2022)', fontsize=16,x=0.45, y=1.0)
# Adjusting layout
plt.tight_layout(rect=[0, 0, 0.80, 0.99])
# Showing the plot
plt.show()
findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans. C:\Users\45502\AppData\Local\Temp\ipykernel_43264\1654487565.py:33: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
# Ensuring that the common columns are in the same dtype
common_columns = ['year', 'season', 'month', 'week', 'day_of_week', 'hour','day_of_month', 'day_of_year']
# Converting the data types for the common columns to match between both datasets
frp_firedata01_22 = frp_firedata01_22.astype({col: 'int64' for col in common_columns})
galicia_weather = galicia_weather.astype({col: 'int64' for col in common_columns})
# Performing the merge
merged_hourly_weather_frp_data = pd.merge(
frp_firedata01_22,
galicia_weather,
on=common_columns,
how='left' # This will include all rows from firedata and fill missing values from galicia_weather
)
# Displaying the first few rows of the merged data
merged_hourly_weather_frp_data.head()
| latitude | longitude | acq_date | acq_time | confidence | frp | datetime | year | season | month | ... | soil_moisture_7_to_28cm (m³/m³) | soil_moisture_28_to_100cm (m³/m³) | soil_moisture_100_to_255cm (m³/m³) | is_day () | sunshine_duration (s) | shortwave_radiation_instant (W/m²) | direct_radiation_instant (W/m²) | diffuse_radiation_instant (W/m²) | direct_normal_irradiance_instant (W/m²) | terrestrial_radiation_instant (W/m²) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42.5118 | -8.4374 | 2001-02-17 | 11:54:00 | 36 | 5.6 | 2001-02-17 11:54:00 | 2001 | 1 | 2 | ... | 0.379 | 0.399 | 0.422 | 1 | 3600.0 | 169.4 | 89.9 | 79.5 | 349.9 | 359.8 |
| 1 | 42.2953 | -8.2946 | 2001-02-19 | 11:42:00 | 60 | 8.7 | 2001-02-19 11:42:00 | 2001 | 1 | 2 | ... | 0.370 | 0.392 | 0.421 | 1 | 3600.0 | 224.7 | 160.9 | 63.8 | 604.1 | 372.9 |
| 2 | 42.2688 | -8.2864 | 2001-02-19 | 22:48:00 | 83 | 16.2 | 2001-02-19 22:48:00 | 2001 | 1 | 2 | ... | 0.367 | 0.391 | 0.420 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | 42.2428 | -6.8630 | 2001-02-21 | 11:30:00 | 71 | 15.2 | 2001-02-21 11:30:00 | 2001 | 1 | 2 | ... | 0.363 | 0.387 | 0.419 | 1 | 3600.0 | 232.8 | 165.7 | 67.1 | 600.0 | 386.2 |
| 4 | 42.2881 | -8.3451 | 2001-02-21 | 11:30:00 | 77 | 20.2 | 2001-02-21 11:30:00 | 2001 | 1 | 2 | ... | 0.363 | 0.387 | 0.419 | 1 | 3600.0 | 232.8 | 165.7 | 67.1 | 600.0 | 386.2 |
5 rows × 42 columns
merged_hourly_weather_frp_data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 22277 entries, 0 to 22276 Data columns (total 42 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 latitude 22277 non-null float64 1 longitude 22277 non-null float64 2 acq_date 22277 non-null datetime64[ns] 3 acq_time 22277 non-null object 4 confidence 22277 non-null int64 5 frp 22277 non-null float64 6 datetime 22277 non-null datetime64[ns] 7 year 22277 non-null int64 8 season 22277 non-null int64 9 month 22277 non-null int64 10 week 22277 non-null int64 11 day_of_week 22277 non-null int64 12 hour 22277 non-null int64 13 day_of_month 22277 non-null int64 14 day_of_year 22277 non-null int64 15 time 22277 non-null datetime64[ns] 16 temperature_2m (°C) 22277 non-null float64 17 relative_humidity_2m (%) 22277 non-null int64 18 dew_point_2m (°C) 22277 non-null float64 19 precipitation (mm) 22277 non-null float64 20 pressure_msl (hPa) 22277 non-null float64 21 surface_pressure (hPa) 22277 non-null float64 22 cloud_cover (%) 22277 non-null int64 23 et0_fao_evapotranspiration (mm) 22277 non-null float64 24 vapour_pressure_deficit (kPa) 22277 non-null float64 25 wind_speed_10m (km/h) 22277 non-null float64 26 wind_gusts_10m (km/h) 22277 non-null float64 27 soil_temperature_0_to_7cm (°C) 22277 non-null float64 28 soil_temperature_7_to_28cm (°C) 22277 non-null float64 29 soil_temperature_28_to_100cm (°C) 22277 non-null float64 30 soil_temperature_100_to_255cm (°C) 22277 non-null float64 31 soil_moisture_0_to_7cm (m³/m³) 22277 non-null float64 32 soil_moisture_7_to_28cm (m³/m³) 22277 non-null float64 33 soil_moisture_28_to_100cm (m³/m³) 22277 non-null float64 34 soil_moisture_100_to_255cm (m³/m³) 22277 non-null float64 35 is_day () 22277 non-null int64 36 sunshine_duration (s) 22277 non-null float64 37 shortwave_radiation_instant (W/m²) 22277 non-null float64 38 direct_radiation_instant (W/m²) 22277 non-null float64 39 diffuse_radiation_instant (W/m²) 22277 non-null float64 40 direct_normal_irradiance_instant (W/m²) 22277 non-null float64 41 terrestrial_radiation_instant (W/m²) 22277 non-null float64 dtypes: datetime64[ns](3), float64(26), int64(12), object(1) memory usage: 7.3+ MB
# Further filtering the data to keep only rows where confidence_category is 'h'
high_confidence_merged_hourly_weather_frp_data = merged_hourly_weather_frp_data[merged_hourly_weather_frp_data['confidence'] > 90]
# Grouping firedata by year, day_of_year, and hour to get fire counts
fire_grouped = frp_firedata01_22.groupby(['year', 'day_of_year', 'hour']).size().reset_index(name='fire_count')
# Merging the fire counts with galicia_weather
galicia_weather_firecount_merged = pd.merge(galicia_weather, fire_grouped, on=['year', 'day_of_year', 'hour'], how='left')
# Replacing NaN with 0, since NaN indicates no fire incidents at that time
galicia_weather_firecount_merged['fire_count'] = galicia_weather_firecount_merged['fire_count'].fillna(0).astype(int)
# Displaying the first few rows of the new dataset
galicia_weather_firecount_merged.head()
| time | temperature_2m (°C) | relative_humidity_2m (%) | dew_point_2m (°C) | precipitation (mm) | pressure_msl (hPa) | surface_pressure (hPa) | cloud_cover (%) | et0_fao_evapotranspiration (mm) | vapour_pressure_deficit (kPa) | ... | terrestrial_radiation_instant (W/m²) | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | fire_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000-01-01 00:00:00 | 1.9 | 83 | -0.7 | 0.0 | 1029.0 | 963.6 | 29 | 0.0 | 0.12 | ... | 0.0 | 2000 | 1 | 1 | 52 | 6 | 1 | 1 | 1 | 0 |
| 1 | 2000-01-01 01:00:00 | 1.3 | 85 | -0.9 | 0.0 | 1029.2 | 963.6 | 24 | 0.0 | 0.10 | ... | 0.0 | 2000 | 1 | 1 | 52 | 6 | 2 | 1 | 1 | 0 |
| 2 | 2000-01-01 02:00:00 | -0.1 | 89 | -1.7 | 0.0 | 1029.3 | 963.4 | 24 | 0.0 | 0.07 | ... | 0.0 | 2000 | 1 | 1 | 52 | 6 | 3 | 1 | 1 | 0 |
| 3 | 2000-01-01 03:00:00 | -1.7 | 92 | -2.9 | 0.0 | 1028.8 | 962.6 | 16 | 0.0 | 0.05 | ... | 0.0 | 2000 | 1 | 1 | 52 | 6 | 4 | 1 | 1 | 0 |
| 4 | 2000-01-01 04:00:00 | -2.2 | 92 | -3.3 | 0.0 | 1028.7 | 962.4 | 8 | 0.0 | 0.04 | ... | 0.0 | 2000 | 1 | 1 | 52 | 6 | 5 | 1 | 1 | 0 |
5 rows × 36 columns
# Filtering out rows with fire_count equal to 0
galicia_weather_nonzero_fire = galicia_weather_firecount_merged[galicia_weather_firecount_merged['fire_count'] != 0]
# Ensuring that the common columns are in the same dtype
common_columns = ['year', 'season', 'month', 'week', 'day_of_week', 'hour','day_of_month', 'day_of_year']
# Converting the data types for the common columns to match between both datasets
galicia_burned_area_byfires_03_18 = galicia_burned_area_byfires_03_18.astype({col: 'int64' for col in common_columns})
galicia_weather = galicia_weather.astype({col: 'int64' for col in common_columns})
# Performing the merge
merged_weather_burntarea_data = pd.merge(
galicia_burned_area_byfires_03_18,
galicia_weather,
on=common_columns,
how='left' # This will include all rows from firedata and fill missing values from galicia_weather
)
# Displaying the first few rows of the merged data
merged_weather_burntarea_data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 72757 entries, 0 to 72756 Data columns (total 40 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 deteccion 72757 non-null datetime64[ns] 1 idprovincia 72757 non-null int64 2 burnt_area 72757 non-null float64 3 latitude 72575 non-null float64 4 longitude 72575 non-null float64 5 year 72757 non-null int64 6 season 72757 non-null int64 7 month 72757 non-null int64 8 week 72757 non-null int64 9 day_of_week 72757 non-null int64 10 hour 72757 non-null int64 11 day_of_month 72757 non-null int64 12 day_of_year 72757 non-null int64 13 time 72757 non-null datetime64[ns] 14 temperature_2m (°C) 72757 non-null float64 15 relative_humidity_2m (%) 72757 non-null int64 16 dew_point_2m (°C) 72757 non-null float64 17 precipitation (mm) 72757 non-null float64 18 pressure_msl (hPa) 72757 non-null float64 19 surface_pressure (hPa) 72757 non-null float64 20 cloud_cover (%) 72757 non-null int64 21 et0_fao_evapotranspiration (mm) 72757 non-null float64 22 vapour_pressure_deficit (kPa) 72757 non-null float64 23 wind_speed_10m (km/h) 72757 non-null float64 24 wind_gusts_10m (km/h) 72757 non-null float64 25 soil_temperature_0_to_7cm (°C) 72757 non-null float64 26 soil_temperature_7_to_28cm (°C) 72757 non-null float64 27 soil_temperature_28_to_100cm (°C) 72757 non-null float64 28 soil_temperature_100_to_255cm (°C) 72757 non-null float64 29 soil_moisture_0_to_7cm (m³/m³) 72757 non-null float64 30 soil_moisture_7_to_28cm (m³/m³) 72757 non-null float64 31 soil_moisture_28_to_100cm (m³/m³) 72757 non-null float64 32 soil_moisture_100_to_255cm (m³/m³) 72757 non-null float64 33 is_day () 72757 non-null int64 34 sunshine_duration (s) 72757 non-null float64 35 shortwave_radiation_instant (W/m²) 72757 non-null float64 36 direct_radiation_instant (W/m²) 72757 non-null float64 37 diffuse_radiation_instant (W/m²) 72757 non-null float64 38 direct_normal_irradiance_instant (W/m²) 72757 non-null float64 39 terrestrial_radiation_instant (W/m²) 72757 non-null float64 dtypes: datetime64[ns](2), float64(26), int64(12) memory usage: 22.8 MB
# Extracting the week number and year to group by
galicia_weather['year_week'] = galicia_weather['time'].dt.strftime('%Y-%U')
# Aggregating data to weekly, taking the mean for each week
weekly_weather_data = galicia_weather.groupby('year_week').mean().reset_index()
# Spliting the 'year_week' column into separate 'year' and 'week' columns
weekly_weather_data[['year', 'week']] = weekly_weather_data['year_week'].str.split('-', expand=True).astype(int)
# Droping the 'year_week' column
weekly_weather_data.drop(columns=['year_week'], inplace=True)
# Displaying the first few rows of the weekly aggregated data
weekly_weather_data.head()
| temperature_2m (°C) | relative_humidity_2m (%) | dew_point_2m (°C) | precipitation (mm) | pressure_msl (hPa) | surface_pressure (hPa) | cloud_cover (%) | et0_fao_evapotranspiration (mm) | vapour_pressure_deficit (kPa) | wind_speed_10m (km/h) | ... | direct_normal_irradiance_instant (W/m²) | terrestrial_radiation_instant (W/m²) | year | season | month | week | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.945833 | 81.166667 | -1.100000 | 0.000000 | 1028.962500 | 963.566667 | 7.166667 | 0.034167 | 0.157500 | 4.691667 | ... | 231.166667 | 139.258333 | 2000 | 1.0 | 1.0 | 0 | 6.0 | 12.5 | 1.0 | 1.0 |
| 1 | 5.446429 | 84.023810 | 2.852381 | 0.016071 | 1025.641071 | 961.237500 | 55.160714 | 0.034643 | 0.162976 | 8.160119 | ... | 140.266667 | 142.637500 | 2000 | 1.0 | 1.0 | 1 | 4.0 | 12.5 | 5.0 | 5.0 |
| 2 | 2.594048 | 85.452381 | 0.247619 | 0.183929 | 1024.692857 | 959.710714 | 42.285714 | 0.029524 | 0.118274 | 12.727381 | ... | 156.902976 | 150.438690 | 2000 | 1.0 | 1.0 | 2 | 4.0 | 12.5 | 12.0 | 12.0 |
| 3 | 2.028571 | 84.505952 | -0.460714 | 0.000000 | 1027.607738 | 962.308929 | 13.339286 | 0.041310 | 0.131071 | 10.794048 | ... | 255.066667 | 160.467857 | 2000 | 1.0 | 1.0 | 3 | 4.0 | 12.5 | 19.0 | 19.0 |
| 4 | 2.569048 | 83.976190 | -0.035119 | 0.002976 | 1024.231548 | 959.273810 | 41.148810 | 0.038929 | 0.135893 | 9.242857 | ... | 181.782738 | 173.112500 | 2000 | 1.0 | 1.0 | 4 | 4.0 | 12.5 | 26.0 | 26.0 |
5 rows × 34 columns
# Extracting the ISO calendar year and week to group by
galicia_weather['year'] = galicia_weather['time'].dt.isocalendar().year
galicia_weather['week'] = galicia_weather['time'].dt.isocalendar().week
# Grouping by the ISO calendar year and week and taking the mean for each week
weekly_weather_data = galicia_weather.groupby(['year', 'week']).mean().reset_index()
# Display the first few rows of the weekly aggregated data
weekly_weather_data.head()
| year | week | temperature_2m (°C) | relative_humidity_2m (%) | dew_point_2m (°C) | precipitation (mm) | pressure_msl (hPa) | surface_pressure (hPa) | cloud_cover (%) | et0_fao_evapotranspiration (mm) | ... | direct_radiation_instant (W/m²) | diffuse_radiation_instant (W/m²) | direct_normal_irradiance_instant (W/m²) | terrestrial_radiation_instant (W/m²) | season | month | day_of_week | hour | day_of_month | day_of_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1999 | 52 | 2.252083 | 82.395833 | -0.608333 | 0.000000 | 1028.852083 | 963.527083 | 8.145833 | 0.034375 | ... | 70.202083 | 21.145833 | 225.227083 | 139.622917 | 1.0 | 1.0 | 6.5 | 12.5 | 1.5 | 1.5 |
| 1 | 2000 | 1 | 5.850595 | 84.660714 | 3.364881 | 0.029167 | 1025.888095 | 961.562500 | 63.571429 | 0.033929 | ... | 40.952976 | 26.702976 | 125.539881 | 143.596429 | 1.0 | 1.0 | 4.0 | 12.5 | 6.0 | 6.0 |
| 2 | 2000 | 2 | 2.138690 | 84.767857 | -0.310119 | 0.170833 | 1023.983929 | 958.941667 | 33.559524 | 0.030952 | ... | 57.545238 | 21.491071 | 176.457738 | 151.745238 | 1.0 | 1.0 | 4.0 | 12.5 | 13.0 | 13.0 |
| 3 | 2000 | 3 | 2.170238 | 84.922619 | -0.255357 | 0.000000 | 1027.510119 | 962.251786 | 16.196429 | 0.040952 | ... | 83.503571 | 24.485119 | 247.979762 | 162.075595 | 1.0 | 1.0 | 4.0 | 12.5 | 20.0 | 20.0 |
| 4 | 2000 | 4 | 3.189286 | 84.494048 | 0.655952 | 0.002976 | 1025.488690 | 960.589286 | 44.785714 | 0.041012 | ... | 66.744048 | 30.870238 | 183.569643 | 175.187500 | 1.0 | 1.0 | 4.0 | 12.5 | 27.0 | 27.0 |
5 rows × 34 columns
weekly_weather_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1271 entries, 0 to 1270 Data columns (total 34 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 1271 non-null UInt32 1 week 1271 non-null UInt32 2 temperature_2m (°C) 1271 non-null float64 3 relative_humidity_2m (%) 1271 non-null float64 4 dew_point_2m (°C) 1271 non-null float64 5 precipitation (mm) 1271 non-null float64 6 pressure_msl (hPa) 1271 non-null float64 7 surface_pressure (hPa) 1271 non-null float64 8 cloud_cover (%) 1271 non-null float64 9 et0_fao_evapotranspiration (mm) 1271 non-null float64 10 vapour_pressure_deficit (kPa) 1271 non-null float64 11 wind_speed_10m (km/h) 1271 non-null float64 12 wind_gusts_10m (km/h) 1271 non-null float64 13 soil_temperature_0_to_7cm (°C) 1271 non-null float64 14 soil_temperature_7_to_28cm (°C) 1271 non-null float64 15 soil_temperature_28_to_100cm (°C) 1271 non-null float64 16 soil_temperature_100_to_255cm (°C) 1271 non-null float64 17 soil_moisture_0_to_7cm (m³/m³) 1271 non-null float64 18 soil_moisture_7_to_28cm (m³/m³) 1271 non-null float64 19 soil_moisture_28_to_100cm (m³/m³) 1271 non-null float64 20 soil_moisture_100_to_255cm (m³/m³) 1271 non-null float64 21 is_day () 1271 non-null float64 22 sunshine_duration (s) 1271 non-null float64 23 shortwave_radiation_instant (W/m²) 1271 non-null float64 24 direct_radiation_instant (W/m²) 1271 non-null float64 25 diffuse_radiation_instant (W/m²) 1271 non-null float64 26 direct_normal_irradiance_instant (W/m²) 1271 non-null float64 27 terrestrial_radiation_instant (W/m²) 1271 non-null float64 28 season 1271 non-null float64 29 month 1271 non-null float64 30 day_of_week 1271 non-null float64 31 hour 1271 non-null float64 32 day_of_month 1271 non-null float64 33 day_of_year 1271 non-null float64 dtypes: UInt32(2), float64(32) memory usage: 330.3 KB
print(weekly_weather_data['week'].unique())
<IntegerArray> [52, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53] Length: 53, dtype: UInt32
weekly_fire_alerts = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\modis_fire_alerts__count.csv")
weekly_fire_alerts.head(5)
| adm2 | year | week | alert__count | confidence_category | |
|---|---|---|---|---|---|
| 0 | 2 | 2016 | 32 | 4 | h |
| 1 | 3 | 2018 | 24 | 1 | n |
| 2 | 3 | 2023 | 34 | 1 | n |
| 3 | 3 | 2012 | 2 | 2 | n |
| 4 | 4 | 2017 | 31 | 1 | h |
weekly_fire_alerts.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1012 entries, 0 to 1011 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 adm2 1012 non-null int64 1 year 1012 non-null int64 2 week 1012 non-null int64 3 alert__count 1012 non-null int64 4 confidence_category 1012 non-null object dtypes: int64(4), object(1) memory usage: 39.7+ KB
# Merging the two datasets on 'year' and 'week' columns
merged_weekly_firealert_weather_data = pd.merge(weekly_weather_data, weekly_fire_alerts, on=['year', 'week'], how='left')
# Filtering the rows where alert__count is not NaN and greater than zero
filtered_weekly_datav1 = merged_weekly_firealert_weather_data[merged_weekly_firealert_weather_data['alert__count'].notna()
& (merged_weekly_firealert_weather_data['alert__count'] > 0)]
filtered_weekly_datav1.head()
| year | week | temperature_2m (°C) | relative_humidity_2m (%) | dew_point_2m (°C) | precipitation (mm) | pressure_msl (hPa) | surface_pressure (hPa) | cloud_cover (%) | et0_fao_evapotranspiration (mm) | ... | terrestrial_radiation_instant (W/m²) | season | month | day_of_week | hour | day_of_month | day_of_year | adm2 | alert__count | confidence_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 628 | 2012 | 2 | 4.217262 | 87.547619 | 2.163690 | 0.052976 | 1026.578571 | 961.840476 | 35.559524 | 0.035833 | ... | 150.533333 | 1.0 | 1.0 | 4.0 | 12.5 | 12.0 | 12.0 | 3.0 | 2.0 | n |
| 630 | 2012 | 4 | 4.857738 | 86.434524 | 2.619048 | 0.014881 | 1025.511310 | 960.983929 | 36.309524 | 0.041131 | ... | 173.393452 | 1.0 | 1.0 | 4.0 | 12.5 | 26.0 | 26.0 | 4.0 | 3.0 | n |
| 631 | 2012 | 4 | 4.857738 | 86.434524 | 2.619048 | 0.014881 | 1025.511310 | 960.983929 | 36.309524 | 0.041131 | ... | 173.393452 | 1.0 | 1.0 | 4.0 | 12.5 | 26.0 | 26.0 | 3.0 | 2.0 | n |
| 632 | 2012 | 4 | 4.857738 | 86.434524 | 2.619048 | 0.014881 | 1025.511310 | 960.983929 | 36.309524 | 0.041131 | ... | 173.393452 | 1.0 | 1.0 | 4.0 | 12.5 | 26.0 | 26.0 | 2.0 | 1.0 | n |
| 633 | 2012 | 4 | 4.857738 | 86.434524 | 2.619048 | 0.014881 | 1025.511310 | 960.983929 | 36.309524 | 0.041131 | ... | 173.393452 | 1.0 | 1.0 | 4.0 | 12.5 | 26.0 | 26.0 | 4.0 | 2.0 | h |
5 rows × 37 columns
# Further filtering the data to keep only rows where confidence_category is 'h'
filtered_weekly_datav2 = filtered_weekly_datav1[filtered_weekly_datav1['confidence_category'] == 'h']
# Creating custom diverging colormap
cmap_negative = plt.get_cmap('RdBu_r', 128)
cmap_positive = plt.get_cmap('RdBu_r', 128)
# Combining them into a custom diverging colormap
custom_cmap = ListedColormap(np.vstack((cmap_negative(np.linspace(0.5, 1, 128)), cmap_positive(np.linspace(0, 0.5, 128)))))
# Selecting the columns for correlation analysis
columns_of_interest = [
'temperature_2m (°C)', 'relative_humidity_2m (%)', 'dew_point_2m (°C)', 'precipitation (mm)',
'pressure_msl (hPa)', 'surface_pressure (hPa)', 'cloud_cover (%)',
'et0_fao_evapotranspiration (mm)', 'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)',
'wind_gusts_10m (km/h)', 'soil_temperature_0_to_7cm (°C)', 'soil_temperature_7_to_28cm (°C)',
'soil_temperature_28_to_100cm (°C)', 'soil_temperature_100_to_255cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)',
'soil_moisture_7_to_28cm (m³/m³)', 'soil_moisture_28_to_100cm (m³/m³)', 'soil_moisture_100_to_255cm (m³/m³)',
'sunshine_duration (s)', 'shortwave_radiation_instant (W/m²)', 'direct_radiation_instant (W/m²)',
'diffuse_radiation_instant (W/m²)', 'direct_normal_irradiance_instant (W/m²)', 'terrestrial_radiation_instant (W/m²)',
'alert__count'
]
# Computing the correlation matrix
corr_matrix = filtered_weekly_datav2[columns_of_interest].corr()
# Extracting the last row ('alert__count') correlation
alert_corr = corr_matrix.loc['alert__count']
# Creating the heatmap with seaborn
plt.figure(figsize=(15, 1))
sns.heatmap(alert_corr.values.reshape(1, -1), annot=True, cmap=custom_cmap, fmt='.2f', xticklabels=alert_corr.index, yticklabels=['alert__count'], cbar=False, linewidths=.5, center=0)
# Setting plot title
plt.title('Correlation of # Fire Alert Counts with Weather Parameters')
plt.savefig(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\images\correlation.png')
# Showing the plot
plt.show()
filtered_weekly_datav2.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 197 entries, 633 to 1974 Data columns (total 37 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 197 non-null UInt32 1 week 197 non-null UInt32 2 temperature_2m (°C) 197 non-null float64 3 relative_humidity_2m (%) 197 non-null float64 4 dew_point_2m (°C) 197 non-null float64 5 precipitation (mm) 197 non-null float64 6 pressure_msl (hPa) 197 non-null float64 7 surface_pressure (hPa) 197 non-null float64 8 cloud_cover (%) 197 non-null float64 9 et0_fao_evapotranspiration (mm) 197 non-null float64 10 vapour_pressure_deficit (kPa) 197 non-null float64 11 wind_speed_10m (km/h) 197 non-null float64 12 wind_gusts_10m (km/h) 197 non-null float64 13 soil_temperature_0_to_7cm (°C) 197 non-null float64 14 soil_temperature_7_to_28cm (°C) 197 non-null float64 15 soil_temperature_28_to_100cm (°C) 197 non-null float64 16 soil_temperature_100_to_255cm (°C) 197 non-null float64 17 soil_moisture_0_to_7cm (m³/m³) 197 non-null float64 18 soil_moisture_7_to_28cm (m³/m³) 197 non-null float64 19 soil_moisture_28_to_100cm (m³/m³) 197 non-null float64 20 soil_moisture_100_to_255cm (m³/m³) 197 non-null float64 21 is_day () 197 non-null float64 22 sunshine_duration (s) 197 non-null float64 23 shortwave_radiation_instant (W/m²) 197 non-null float64 24 direct_radiation_instant (W/m²) 197 non-null float64 25 diffuse_radiation_instant (W/m²) 197 non-null float64 26 direct_normal_irradiance_instant (W/m²) 197 non-null float64 27 terrestrial_radiation_instant (W/m²) 197 non-null float64 28 season 197 non-null float64 29 month 197 non-null float64 30 day_of_week 197 non-null float64 31 hour 197 non-null float64 32 day_of_month 197 non-null float64 33 day_of_year 197 non-null float64 34 adm2 197 non-null float64 35 alert__count 197 non-null float64 36 confidence_category 197 non-null object dtypes: UInt32(2), float64(34), object(1) memory usage: 57.3+ KB
filtered_weekly_datav2.head(5)
| year | week | temperature_2m (°C) | relative_humidity_2m (%) | dew_point_2m (°C) | precipitation (mm) | pressure_msl (hPa) | surface_pressure (hPa) | cloud_cover (%) | et0_fao_evapotranspiration (mm) | ... | terrestrial_radiation_instant (W/m²) | season | month | day_of_week | hour | day_of_month | day_of_year | adm2 | alert__count | confidence_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 633 | 2012 | 4 | 4.857738 | 86.434524 | 2.619048 | 0.014881 | 1025.511310 | 960.983929 | 36.309524 | 0.041131 | ... | 173.393452 | 1.000000 | 1.000000 | 4.0 | 12.5 | 26.000000 | 26.0 | 4.0 | 2.0 | h |
| 645 | 2012 | 8 | 6.883333 | 78.547619 | 2.991667 | 0.000000 | 1027.979762 | 963.739286 | 23.523810 | 0.081726 | ... | 244.268452 | 1.000000 | 2.000000 | 4.0 | 12.5 | 23.000000 | 54.0 | 3.0 | 2.0 | h |
| 651 | 2012 | 9 | 8.844048 | 83.238095 | 5.794048 | 0.026190 | 1024.092857 | 960.533333 | 48.523810 | 0.076607 | ... | 265.011310 | 1.571429 | 2.571429 | 4.0 | 12.5 | 13.428571 | 61.0 | 1.0 | 2.0 | h |
| 656 | 2012 | 9 | 8.844048 | 83.238095 | 5.794048 | 0.026190 | 1024.092857 | 960.533333 | 48.523810 | 0.076607 | ... | 265.011310 | 1.571429 | 2.571429 | 4.0 | 12.5 | 13.428571 | 61.0 | 4.0 | 2.0 | h |
| 657 | 2012 | 9 | 8.844048 | 83.238095 | 5.794048 | 0.026190 | 1024.092857 | 960.533333 | 48.523810 | 0.076607 | ... | 265.011310 | 1.571429 | 2.571429 | 4.0 | 12.5 | 13.428571 | 61.0 | 3.0 | 15.0 | h |
5 rows × 37 columns
print(filtered_weekly_datav2['alert__count'].unique())
[ 2. 15. 3. 6. 7. 1. 5. 18. 8. 44. 4. 9. 42. 12. 34. 31. 57. 10. 62. 68. 16. 43. 21. 70. 112. 64. 14. 30.]
Machine Learning Part¶
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import xgboost as xgb
# MLStep 1: Data Preprocessing
# Assuming `filtered_weekly_datav2` is already loaded
df = filtered_weekly_datav2.copy()
# Remove any missing values
df.dropna(inplace=True)
# ML Step 2: Defining Features and Target
# Defining the target variable: 1 if 'alert__count' > 10, else 0
df['high_alert'] = (df['alert__count'] > 10).astype(int)
# Selecting relevant features
features = ['season', 'month', 'week', 'temperature_2m (°C)', 'relative_humidity_2m (%)',
'precipitation (mm)', 'et0_fao_evapotranspiration (mm)',
'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)',
'soil_temperature_0_to_7cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)',
'direct_normal_irradiance_instant (W/m²)']
X = df[features]
y = df['high_alert']
# ML Step 3: Data Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# ML Step 4: Training XGBoost Model
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X_train, y_train)
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=None, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=None, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=None, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=None, n_jobs=None,
num_parallel_tree=None, random_state=None, ...)
# ML Step 5: Model Evaluation
y_pred = xgb_model.predict(X_test)
# Confusion Matrix and Classification Report
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
[[49 4]
[ 4 3]]
precision recall f1-score support
0 0.92 0.92 0.92 53
1 0.43 0.43 0.43 7
accuracy 0.87 60
macro avg 0.68 0.68 0.68 60
weighted avg 0.87 0.87 0.87 60
from sklearn.inspection import permutation_importance
# Calculating permutation importance
perm_importance = permutation_importance(xgb_model, X_test, y_test, n_repeats=10, random_state=42)
# Printing feature importances
feature_importance = pd.Series(perm_importance.importances_mean, index=X_test.columns)
print("Permutation Feature Importance:")
print(feature_importance.sort_values(ascending=False))
Permutation Feature Importance: wind_speed_10m (km/h) 0.015000 vapour_pressure_deficit (kPa) 0.005000 month 0.000000 week 0.000000 direct_normal_irradiance_instant (W/m²) 0.000000 precipitation (mm) -0.005000 soil_temperature_0_to_7cm (°C) -0.008333 relative_humidity_2m (%) -0.008333 et0_fao_evapotranspiration (mm) -0.010000 temperature_2m (°C) -0.011667 season -0.013333 soil_moisture_0_to_7cm (m³/m³) -0.023333 dtype: float64
from sklearn.inspection import plot_partial_dependence
# Ploting partial dependence for key features
key_features = ['temperature_2m (°C)', 'precipitation (mm)', 'relative_humidity_2m (%)']
plot_partial_dependence(xgb_model, X_test, key_features)
plt.show()
C:\Users\45502\anaconda3\lib\site-packages\sklearn\utils\deprecation.py:87: FutureWarning: Function plot_partial_dependence is deprecated; Function `plot_partial_dependence` is deprecated in 1.0 and will be removed in 1.2. Use PartialDependenceDisplay.from_estimator instead
high_fire = df[df['high_alert'] == 1][features]
print("High Fire Counts Feature Means:")
print(high_fire.mean())
High Fire Counts Feature Means: season 3.308571 month 8.102857 week 32.960000 temperature_2m (°C) 19.103929 relative_humidity_2m (%) 70.141190 precipitation (mm) 0.033500 et0_fao_evapotranspiration (mm) 0.178805 vapour_pressure_deficit (kPa) 0.835679 wind_speed_10m (km/h) 12.126881 soil_temperature_0_to_7cm (°C) 20.664310 soil_moisture_0_to_7cm (m³/m³) 0.141064 direct_normal_irradiance_instant (W/m²) 300.593048 dtype: float64
print(high_fire.std())
season 0.601416 month 1.517203 week 6.592167 temperature_2m (°C) 3.515758 relative_humidity_2m (%) 9.240537 precipitation (mm) 0.067844 et0_fao_evapotranspiration (mm) 0.051497 vapour_pressure_deficit (kPa) 0.348321 wind_speed_10m (km/h) 2.623547 soil_temperature_0_to_7cm (°C) 3.813663 soil_moisture_0_to_7cm (m³/m³) 0.061367 direct_normal_irradiance_instant (W/m²) 79.538858 dtype: float64
low_fire = df[df['high_alert'] == 0][features]
print("Low Fire Counts Feature Means:")
print(low_fire.mean())
Low Fire Counts Feature Means: season 2.882890 month 6.745017 week 27.523256 temperature_2m (°C) 15.577357 relative_humidity_2m (%) 74.007821 precipitation (mm) 0.039708 et0_fao_evapotranspiration (mm) 0.150774 vapour_pressure_deficit (kPa) 0.597784 wind_speed_10m (km/h) 11.970449 soil_temperature_0_to_7cm (°C) 17.142701 soil_moisture_0_to_7cm (m³/m³) 0.182801 direct_normal_irradiance_instant (W/m²) 281.338611 dtype: float64
print(low_fire.std())
season 0.864864 month 2.517798 week 10.920475 temperature_2m (°C) 4.535791 relative_humidity_2m (%) 5.887160 precipitation (mm) 0.071569 et0_fao_evapotranspiration (mm) 0.048418 vapour_pressure_deficit (kPa) 0.238383 wind_speed_10m (km/h) 2.717183 soil_temperature_0_to_7cm (°C) 5.143733 soil_moisture_0_to_7cm (m³/m³) 0.083412 direct_normal_irradiance_instant (W/m²) 72.368410 dtype: float64
corr = df.corr()
print("Correlation with High Alert Counts:")
print(corr['high_alert'].sort_values(ascending=False))
Correlation with High Alert Counts: high_alert 1.000000 alert__count 0.783563 vapour_pressure_deficit (kPa) 0.298493 temperature_2m (°C) 0.257748 soil_temperature_0_to_7cm (°C) 0.229397 soil_temperature_7_to_28cm (°C) 0.227418 soil_temperature_28_to_100cm (°C) 0.211036 dew_point_2m (°C) 0.193162 et0_fao_evapotranspiration (mm) 0.188695 soil_temperature_100_to_255cm (°C) 0.188130 month 0.184714 day_of_year 0.175255 week 0.170935 season 0.167757 year 0.130221 direct_radiation_instant (W/m²) 0.097938 sunshine_duration (s) 0.087851 direct_normal_irradiance_instant (W/m²) 0.087560 adm2 0.080088 shortwave_radiation_instant (W/m²) 0.078600 terrestrial_radiation_instant (W/m²) 0.052186 wind_gusts_10m (km/h) 0.022008 wind_speed_10m (km/h) 0.019339 is_day () 0.017811 cloud_cover (%) -0.017529 precipitation (mm) -0.029191 diffuse_radiation_instant (W/m²) -0.047867 surface_pressure (hPa) -0.068558 pressure_msl (hPa) -0.127857 day_of_month -0.139768 soil_moisture_0_to_7cm (m³/m³) -0.169841 soil_moisture_7_to_28cm (m³/m³) -0.186412 relative_humidity_2m (%) -0.198260 soil_moisture_28_to_100cm (m³/m³) -0.201943 soil_moisture_100_to_255cm (m³/m³) -0.244854 day_of_week NaN hour NaN Name: high_alert, dtype: float64
import matplotlib.pyplot as plt
# Listing of features to analyze
boxplotfeatures = ['temperature_2m (°C)', 'relative_humidity_2m (%)',
'precipitation (mm)', 'et0_fao_evapotranspiration (mm)',
'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)',
'soil_temperature_0_to_7cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)',
'direct_normal_irradiance_instant (W/m²)']
# Calculating and print summary statistics
summary_stats = high_fire[boxplotfeatures].describe().T[['min', '25%', '50%', '75%', 'max']]
print(summary_stats)
# Creating 3x3 subplots for box plots
fig, axes = plt.subplots(3, 3, figsize=(15, 15))
# Iterating over features and corresponding subplot axes
for i, feature in enumerate(boxplotfeatures):
ax = axes[i//3, i%3]
ax.boxplot(high_fire[feature].dropna())
ax.set_title(feature)
ax.set_ylabel('Values')
ax.set_xticks([])
# Removing empty subplots
for i in range(len(boxplotfeatures), len(axes.flatten())):
fig.delaxes(axes.flatten()[i])
plt.tight_layout()
plt.show()
min 25% 50% \
temperature_2m (°C) 8.844048 17.812500 19.305357
relative_humidity_2m (%) 52.791667 63.440476 73.517857
precipitation (mm) 0.000000 0.000000 0.003571
et0_fao_evapotranspiration (mm) 0.057083 0.155536 0.191131
vapour_pressure_deficit (kPa) 0.244524 0.631190 0.758095
wind_speed_10m (km/h) 7.463095 10.320238 12.439286
soil_temperature_0_to_7cm (°C) 8.925000 18.913690 20.604167
soil_moisture_0_to_7cm (m³/m³) 0.097875 0.107679 0.113750
direct_normal_irradiance_instant (W/m²) 92.613095 287.964286 328.675000
75% max
temperature_2m (°C) 20.339881 26.552976
relative_humidity_2m (%) 76.083333 83.541667
precipitation (mm) 0.022619 0.285119
et0_fao_evapotranspiration (mm) 0.210774 0.264643
vapour_pressure_deficit (kPa) 1.060536 1.633810
wind_speed_10m (km/h) 14.095833 16.160714
soil_temperature_0_to_7cm (°C) 23.627381 26.592262
soil_moisture_0_to_7cm (m³/m³) 0.136482 0.294196
direct_normal_irradiance_instant (W/m²) 346.464881 402.494643
# Listing of features to analyze
boxplotfeatures = ['temperature_2m (°C)', 'relative_humidity_2m (%)',
'precipitation (mm)', 'et0_fao_evapotranspiration (mm)',
'vapour_pressure_deficit (kPa)', 'wind_speed_10m (km/h)',
'soil_temperature_0_to_7cm (°C)', 'soil_moisture_0_to_7cm (m³/m³)',
'direct_normal_irradiance_instant (W/m²)']
# Calculating and print summary statistics
summary_stats = high_fire[boxplotfeatures].describe().T[['min', '25%', '50%', '75%', 'max']]
print(summary_stats)
# Creating 3x3 subplots for box plots
fig, axes = plt.subplots(3, 3, figsize=(15, 15))
# Listing of colors for each boxplot
colors = ['#FF9999', '#66B3FF', '#99FF99', '#FFCC99', '#FF6666', '#66CC99', '#FFCC66', '#6666FF', '#CC99FF']
# Iterating over features and corresponding subplot axes
for i, feature in enumerate(boxplotfeatures):
ax = axes[i // 3, i % 3]
bp = ax.boxplot(high_fire[feature].dropna(), patch_artist=True, notch=True, boxprops=dict(facecolor=colors[i], color=colors[i]),
whiskerprops=dict(color=colors[i], linewidth=2), capprops=dict(color=colors[i], linewidth=2),
medianprops=dict(color='black', linewidth=2))
ax.set_title(feature)
ax.set_ylabel('Values')
ax.set_xticks([])
ax.grid(True, linestyle='--', alpha=0.7)
# Removing empty subplots
for i in range(len(boxplotfeatures), len(axes.flatten())):
fig.delaxes(axes.flatten()[i])
plt.tight_layout()
plt.show()
min 25% 50% \
temperature_2m (°C) 8.844048 17.812500 19.305357
relative_humidity_2m (%) 52.791667 63.440476 73.517857
precipitation (mm) 0.000000 0.000000 0.003571
et0_fao_evapotranspiration (mm) 0.057083 0.155536 0.191131
vapour_pressure_deficit (kPa) 0.244524 0.631190 0.758095
wind_speed_10m (km/h) 7.463095 10.320238 12.439286
soil_temperature_0_to_7cm (°C) 8.925000 18.913690 20.604167
soil_moisture_0_to_7cm (m³/m³) 0.097875 0.107679 0.113750
direct_normal_irradiance_instant (W/m²) 92.613095 287.964286 328.675000
75% max
temperature_2m (°C) 20.339881 26.552976
relative_humidity_2m (%) 76.083333 83.541667
precipitation (mm) 0.022619 0.285119
et0_fao_evapotranspiration (mm) 0.210774 0.264643
vapour_pressure_deficit (kPa) 1.060536 1.633810
wind_speed_10m (km/h) 14.095833 16.160714
soil_temperature_0_to_7cm (°C) 23.627381 26.592262
soil_moisture_0_to_7cm (m³/m³) 0.136482 0.294196
direct_normal_irradiance_instant (W/m²) 346.464881 402.494643
# Loading the weather forecast dataset for May 2024
weatherforecasts = pd.read_csv(r"C:\Users\45502\Desktop\galicia\final datasets\weatherforecasts.csv", encoding='ISO-8859-1')
# Converting 'time' column to datetime and set it as the index
weatherforecasts['time'] = pd.to_datetime(weatherforecasts['time'])
weatherforecasts.set_index('time', inplace=True)
# Defining features and their thresholds
features = {
'temperature_2m (°C)': ('above', 19.103929),
'relative_humidity_2m (%)': ('below', 70.141190),
'et0_fao_evapotranspiration (mm)': ('above', 0.178805),
'vapour_pressure_deficit (kPa)': ('above', 0.835679),
'wind_speed_10m (km/h)': ('above', 12.126881),
'soil_temperature_0cm (°C)': ('above', 20.664310),
'soil_moisture_0_to_1cm (m³/m³)': ('below', 0.141064),
'direct_normal_irradiance_instant (W/m²)': ('above', 300.593048)
}
# Setting up the subplots
fig, axes = plt.subplots(len(features), 1, figsize=(15, len(features) * 3))
# Plotting each feature in a separate subplot
for i, (feature, (condition, threshold)) in enumerate(features.items()):
ax = axes[i]
ax.plot(weatherforecasts.index, weatherforecasts[feature], label=feature, color='blue')
# Adding the threshold zone
if condition == 'above':
ax.axhspan(threshold, weatherforecasts[feature].max(), color='red', alpha=0.3)
else:
ax.axhspan(weatherforecasts[feature].min(), threshold, color='red', alpha=0.3)
ax.set_title(f'{feature}')
ax.set_ylabel('Values')
ax.legend()
# Customizing the x-axis
ax.set_xticks(weatherforecasts.index[::24])
ax.set_xticklabels(weatherforecasts.index[::24].strftime('%Y-%m-%d'), rotation=45, ha='right')
plt.tight_layout()
plt.show()
# Defining features and their thresholds
features = {
'temperature_2m (°C)': ('above', 19.103929),
'relative_humidity_2m (%)': ('below', 70.141190),
'et0_fao_evapotranspiration (mm)': ('above', 0.178805),
'vapour_pressure_deficit (kPa)': ('above', 0.835679),
'wind_speed_10m (km/h)': ('above', 12.126881),
'soil_temperature_0cm (°C)': ('above', 20.664310),
'soil_moisture_0_to_1cm (m³/m³)': ('below', 0.141064),
'direct_normal_irradiance_instant (W/m²)': ('above', 300.593048)
}
# Setting up subplots
fig = make_subplots(rows=len(features), cols=1, subplot_titles=list(features.keys()))
# Plotting each feature in a separate subplot
for i, (feature, (condition, threshold)) in enumerate(features.items(), start=1):
# Add line plot
fig.add_trace(go.Scatter(
x=weatherforecasts.index,
y=weatherforecasts[feature],
mode='lines',
name=feature,
hoverinfo='x+y',
line=dict(color='blue')
), row=i, col=1)
# Adding the threshold zone
if condition == 'above':
fig.add_shape(
type='rect',
x0=weatherforecasts.index.min(),
x1=weatherforecasts.index.max(),
y0=threshold,
y1=weatherforecasts[feature].max(),
fillcolor='red',
opacity=0.3,
line_width=0,
row=i, col=1
)
else:
fig.add_shape(
type='rect',
x0=weatherforecasts.index.min(),
x1=weatherforecasts.index.max(),
y0=weatherforecasts[feature].min(),
y1=threshold,
fillcolor='red',
opacity=0.3,
line_width=0,
row=i, col=1
)
# Updating x-axis for each subplot
fig.update_xaxes(tickmode='array', tickvals=weatherforecasts.index[::24], ticktext=weatherforecasts.index[::24].strftime('%Y-%m-%d'), tickangle=45, row=i, col=1)
# Updating y-axis for each subplot
fig.update_yaxes(showgrid=True, tickmode='auto', nticks=15, row=i, col=1)
# Updating layout
fig.update_layout(height=len(features) * 300, showlegend=True, hovermode='x unified')
fig.write_html(r'C:\Users\45502\Desktop\galicia\githubnael\social-data-final.github.io\visuals\timeseries.html')
# Showing plot
fig.show()
# Function to create polygons from corner coordinates
def create_polygon(row):
points = [(row[f'lon-{i}'], row[f'lat-{i}']) for i in range(1, 5)]
return Polygon(points)
# Filtering data within the latitude and longitude range
def filter_data(data):
lon_filtered = data.loc[
(data['lon-1'] >= -9.3) & (data['lon-1'] <= -6.73) &
(data['lon-2'] >= -9.3) & (data['lon-2'] <= -6.73) &
(data['lon-3'] >= -9.3) & (data['lon-3'] <= -6.73) &
(data['lon-4'] >= -9.3) & (data['lon-4'] <= -6.73)
]
lat_filtered = lon_filtered.loc[
(lon_filtered['lat-1'] >= 41.8) & (lon_filtered['lat-1'] <= 43.8) &
(lon_filtered['lat-2'] >= 41.8) & (lon_filtered['lat-2'] <= 43.8) &
(lon_filtered['lat-3'] >= 41.8) & (lon_filtered['lat-3'] <= 43.8) &
(lon_filtered['lat-4'] >= 41.8) & (lon_filtered['lat-4'] <= 43.8)
]
return lat_filtered
irre_data = filter_data(irre_data)
fire_data = filter_data(fire_data)
# Creating GeoDataFrames
irre_data['geometry'] = irre_data.apply(create_polygon, axis=1)
fire_data['geometry'] = fire_data.apply(create_polygon, axis=1)
gdf_irre = gpd.GeoDataFrame(irre_data, geometry='geometry')
gdf_fire = gpd.GeoDataFrame(fire_data, geometry='geometry')
# Function to plot choropleth map
def plot_choropleth(gdf, column, title, color_scale='Purples'):
gdf['center'] = gdf['geometry'].centroid
gdf['lon'] = gdf['center'].x
gdf['lat'] = gdf['center'].y
fig = px.choropleth_mapbox(gdf, geojson=gdf.geometry.__geo_interface__,
locations=gdf.index, color=column,
mapbox_style="carto-positron", center={"lat": 42.7, "lon": -8.015},
zoom=6.5, opacity=0.6, color_continuous_scale=color_scale)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}, title=title)
fig.show()
# Plotting Irreplaceability-score_rank map
plot_choropleth(gdf_irre, 'Irreplaceability-score_rank', 'Irreplaceability Score Rank')
# Plotting Aggregated-fire-risk map using the 'Reds' palette
plot_choropleth(gdf_fire, 'Aggregated-fire-risk', 'Aggregated Fire Risk', color_scale='Reds')
# Function to plot choropleth map and export to HTML
def plot_choropleth_and_export(gdf, column, title, output_file, color_scale='Purples'):
gdf['center'] = gdf['geometry'].centroid
gdf['lon'] = gdf['center'].x
gdf['lat'] = gdf['center'].y
fig = px.choropleth_mapbox(
gdf,
geojson=gdf.geometry.__geo_interface__,
locations=gdf.index,
color=column,
mapbox_style="carto-positron",
center={"lat": 42.7, "lon": -8.015},
zoom=6.5,
opacity=0.6,
color_continuous_scale=color_scale
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}, title=title)
# Saving to HTML
fig.write_html(output_file)
# Plotting and export Irreplaceability-score_rank map
plot_choropleth_and_export(
gdf_irre, 'Irreplaceability-score_rank',
'Irreplaceability Score Rank', 'Irreplaceability_score_rank_map.html'
)
# Plotting and export Aggregated-fire-risk map using the 'Reds' palette
plot_choropleth_and_export(
gdf_fire, 'Aggregated-fire-risk',
'Aggregated Fire Risk', 'Aggregated_fire_risk_map.html', color_scale='Reds'
)